Commit Graph

84 Commits

Author SHA1 Message Date
Celes Renata 19f39301d3 fix: skip superset/trino ingress when hostname is empty (fixes Kargo beta promotion) 2026-04-21 05:09:08 +00:00
Celes Renata 9275351279 fix: beta stage uses paper DB to resolve permission denied errors
Beta was pointing at stonks_beta DB where tables were owned by postgres
superuser, causing permission denied for the stonks app user. Switch to
sharing stonks_paper DB/user (already has proper grants). DEPLOY_STAGE=beta
still isolates Redis keys and MinIO buckets. Added market data API key
so beta can test ingestion when pipeline is toggled ON.
2026-04-21 04:57:29 +00:00
Celes Renata 5d0635a291 feat: beta deploys all services with pipeline toggle defaulting to OFF
- pipelineEnabled: true in beta so all pods run (Kargo happy)
- PIPELINE_DEFAULT_OFF=true in beta config — scheduler initializes
  the Redis toggle to OFF on first boot
- Shared Ollama (10.1.1.12:2701) between beta and paper
- Flip pipeline ON from the UI when testing, OFF when done
- Optimistic UI update for the toggle button
2026-04-21 03:54:00 +00:00
Celes Renata 6703b20b59 fix: wait for infra pods to exist before checking readiness
kubectl wait fails immediately with 'no matching resources found' if
pods haven't been created yet. Added a poll loop to wait for all 3
infra pods (postgres, redis, minio) to exist before running wait.
2026-04-21 02:42:20 +00:00
Celes Renata c166aafc40 fix: reduce integration test timeout to 5 minutes
Tests complete in ~7s. The 10-minute timeout was causing unnecessary
wait time on failures. Reduced Job activeDeadlineSeconds and kubectl
wait timeout to 300s.
2026-04-21 01:52:11 +00:00
Celes Renata be526ae614 feat: pipeline on/off toggle with per-stage Helm control
- Added pipelineEnabled flag to Helm values (default: true)
- Worker services (scheduler, ingestion, parser, extractor, aggregation,
  recommendation, broker-adapter, lake-publisher) scale to 0 when disabled
- API services always run regardless of toggle
- Redis-based runtime toggle: POST /api/ops/pipeline/toggle
- Scheduler checks the flag before each cycle
- Frontend: green/red Pipeline ON/OFF button on the pipeline page
- Beta defaults to pipelineEnabled: false
- Base values.yaml: blanked external URLs (Ollama, Polygon, Alpaca)
  so stages only connect to what they explicitly configure
2026-04-21 00:21:53 +00:00
Celes Renata a19ed086fe fix: blank external URLs in base values, set only in stage overrides
Base values.yaml now has empty OLLAMA_BASE_URL, MARKET_DATA_BASE_URL,
and BROKER_PROVIDER. Only paper (and eventually live) set the real
URLs. Beta inherits empty defaults so it can't reach external services.
2026-04-21 00:15:22 +00:00
Celes Renata d8fce71178 fix: disable pipeline workers and blank API keys in beta stage
Beta is for API testing only. Scale scheduler, ingestion, parser,
extractor, aggregation, recommendation, broker-adapter, and
lake-publisher to 0 replicas. Blank out Polygon and Alpaca keys.
Infra secrets (postgres, redis, minio) kept so API services work.
2026-04-21 00:14:02 +00:00
Celes Renata 7eab50fda9 fix: disable all external connections and pipeline workers in beta
Beta is for API testing only. Blanked out Polygon/Alpaca/Ollama
credentials, set OLLAMA_BASE_URL to localhost:99999, and scaled
scheduler/ingestion/parser/extractor/aggregation/recommendation/
broker-adapter/lake-publisher to 0 replicas.
2026-04-21 00:13:42 +00:00
Celes Renata 20faa8e20d fix: bake secrets into values-paper.yaml and auto-seed on empty DB
- All paper stage credentials now in values-paper.yaml so ArgoCD
  renders them correctly on every sync (no more empty secrets)
- Added seed-if-empty init container to scheduler: runs the seed
  script if the companies table is empty after migrations
2026-04-20 17:40:41 +00:00
Celes Renata 740ddc1c54 fix: revert extractor to 1 replica (single GPU bottleneck) 2026-04-20 12:16:51 +00:00
Celes Renata f1f0b7e34c fix: scale extractor to 3 replicas in paper stage
The extraction queue had 3000+ SEC filings backed up with a single
extractor pod processing them at 10-115s each. Ollama handles
concurrent requests so multiple extractor pods can share the GPU.
2026-04-20 10:59:05 +00:00
Celes Renata 5289f0f195 fix: use kubectl wait for job completion detection in inttest pipeline
The polling loop checked conditions[0].type which missed the Complete
condition when it wasn't at index 0. Switch to kubectl wait
--for=condition=complete which handles condition matching reliably.
2026-04-20 07:47:10 +00:00
Celes Renata 422326bf83 fix: inttest pipeline timeout and busybox grep compatibility
- Poll job status instead of kubectl wait (catches Failed condition
  immediately instead of waiting 600s for Complete that never comes)
- Replace grep -oP (Perl regex) with POSIX grep -o (BusyBox compat)
2026-04-20 04:23:19 +00:00
Celes Renata f3aac0ac3d fix: superset config uses POSTGRES_DB and REDIS_DB env vars for stage isolation 2026-04-19 23:49:11 +00:00
Celes Renata 0f2f0460a6 fix: dedicated scheduler Dockerfile with psql for migrations, remove Python splitter 2026-04-19 23:35:00 +00:00
Celes Renata 48fed18078 feat: per-stage PostgreSQL users for database isolation (stonks_beta, stonks_paper) 2026-04-19 23:17:22 +00:00
Celes Renata 47f10cd3cf fix: use Python asyncpg migration runner instead of psql, remove postgresql-client from image 2026-04-19 22:54:01 +00:00
Celes Renata 021efba294 feat: auto-run migrations via psql init container on scheduler startup 2026-04-19 22:37:50 +00:00
Celes Renata 5c63264393 feat: stage-isolated infrastructure — separate Postgres DBs, Redis DBs, and MinIO bucket prefixes per stage 2026-04-19 22:20:03 +00:00
Celes Renata 2621b3c5c5 feat: add stage-specific ingress hostnames for beta and paper 2026-04-19 22:00:47 +00:00
Celes Renata 827be709df fix: use Recreate strategy for hive-metastore and superset (RWO PVC) 2026-04-19 20:41:22 +00:00
Celes Renata a9be904afe fix: guard ghcr-secret template against nil ghcrAuth values 2026-04-19 19:51:29 +00:00
Celes Renata 1f69a27e3b fix: replace mktemp with PID-based temp path for BusyBox compat
BusyBox mktemp in alpine/k8s doesn't support .json suffix in template.
The mktemp failure triggered set -e, causing pipeline to report failure
despite all 93 tests passing.
2026-04-19 19:35:02 +00:00
Celes Renata 4df513d096 fix: remove bucket-init job, wait for pods before readiness check
- Remove minio-bucket-init Job entirely (seed_minio.py creates bucket)
- Wait for pods to exist before kubectl wait --for=condition=ready
- Fixes 'no matching resources found' race when pods are still ContainerCreating
2026-04-19 19:25:49 +00:00
Celes Renata b2b8aca7c6 fix: inttest runner crash and minio bucket-init proxy issue
- Remove --profiling-output arg from runner.yaml (plugin uses default path)
- Inline profiling hooks in root conftest.py with graceful fallback
- Replace mc-based bucket-init with Python urllib (no proxy interference)
- Add explicit ProxyHandler({}) to guarantee no proxy usage in bucket-init
2026-04-19 19:15:20 +00:00
Celes Renata 4ebf75134f ci: clear proxy env in minio-bucket-init, capture seed pod logs on failure 2026-04-19 08:55:52 +00:00
Celes Renata 5be3ce2db9 feat: migrate CI/CD from GHCR to local Harbor registry
- Makefile: GHCR -> registry.celestium.life/stonks-oracle
- GitHub Actions: login to Harbor, use HARBOR_PASSWORD secret
- infra/k8s/*.yaml: all image refs -> registry.celestium.life
- inttest pipeline: remove GHCR pull secret (local registry, no auth)
- Steering docs: update registry/git endpoints
2026-04-19 07:34:28 +00:00
Celes Renata c2372ccd1e ci: add NO_PROXY to minio-bucket-init to bypass proxy for internal services 2026-04-19 07:02:27 +00:00
Celes Renata 2d40d70975 ci: remove remaining ghcr-credentials from inttest seed/minio pod overrides 2026-04-19 06:45:46 +00:00
Celes Renata ebafe795c1 fix: bump seed pod timeout to 5m and add debug diagnostics on pipeline failures 2026-04-19 06:34:58 +00:00
Celes Renata 19b63dd369 ci: migrate inttest images from GHCR to local registry, remove ghcr-credentials 2026-04-19 06:22:35 +00:00
Celes Renata e3e1531847 ci: add Docker Hub auth + proxy CA to inttest namespace, fix MinIO pull secret 2026-04-19 06:09:56 +00:00
Celes Renata 5f6d23888a ci: fix lint errors across project, update ruff.toml per-file ignores 2026-04-18 21:02:28 +00:00
Celes Renata c85c0068a2 fix: clean up utcnow deprecation warnings, fix 12 failing tests, add CI/CD pipeline manifests
- Replace all datetime.utcnow() with datetime.now(tz=timezone.utc) across 8 files
- Fix 12 failing tests to match current implementation behavior
- Fix pytest_plugins in non-top-level conftest (moved to root conftest.py)
- Auto-fix 189 lint issues (import sorting, unused imports)
- Add CI/CD pipeline infrastructure (ARC, ArgoCD, Kargo manifests)
- Add values-beta.yaml and values-paper.yaml for staged deployments
- Update GitHub Actions workflow to use self-hosted-gremlin runners
- Add integration-test job to CI pipeline

Result: 1596 passed, 0 failed, 0 warnings
2026-04-18 03:59:28 +00:00
Celes Renata 7a541aa693 fix: trend_windows.id missing gen_random_uuid() default 2026-04-17 17:27:41 +00:00
Celes Renata 84b3c06b2f fix: sync agent DB prompts with code defaults + update model to qwen3.5:9b-fast 2026-04-17 17:12:09 +00:00
Celes Renata c5b7bddadb fix: backfill recommendation evidence for existing recommendations
Migration 028: For each recommendation with no evidence rows, finds
the closest matching trend_window (by ticker + time_horizon + timestamp)
and re-inserts evidence from top_supporting/opposing_evidence arrays.
Filters out non-UUID pattern IDs and verifies documents exist.

This fixes 'No evidence linked' on recommendations created before the
UUID filtering fix in persist_recommendation.
2026-04-17 07:37:14 +00:00
Celes Renata 7c23c044d7 feat: agent variants — migration, API, service integration, frontend, tests
- Migration 027: agent_variants table with single-active enforcement,
  variant_id column on agent_performance_log
- API: full CRUD, clone from agent/variant, activate/deactivate,
  per-variant performance metrics and history endpoints
- Services: extractor, event classifier, thesis rewriter all wired
  to AgentConfigResolver with variant override support
- Frontend: variant list, comparison view, create/edit/clone forms,
  activate/delete actions on Agents page
- Tests: API tests + 5 property-based tests (single-active invariant,
  clone preservation, config resolution, slug determinism, update idempotence)
- Spec files for agent-variants feature
2026-04-17 05:15:42 +00:00
Celes Renata c501ccea40 fix: default model to qwen3.5:9b + improve event classifier prompt
- Migration 026 and OllamaConfig now default to qwen3.5:9b instead of
  llama3.1:8b. Existing deployments keep their current model (qwen3.5:9b-fast)
  since the migration uses WHERE NOT EXISTS on slug.

- Event classifier system prompt expanded with macro-vs-company filtering:
  explicitly instructs the model to NOT classify single-company news
  (lawsuits, earnings, management changes, debt crises) as macro events.
  Sets severity=low and confidence<0.3 for company-specific articles.
  Reserves 'critical' severity for multi-country/global market events.
  Prevents over-tagging event_types by requiring direct description.

- Updated test_system_prompt_is_concise threshold to accommodate the
  expanded prompt (300 → 1000 chars).
2026-04-17 02:53:38 +00:00
Celes Renata 45752b9a29 feat: AI Agents management page with per-agent performance tracking
New Agents tab in the sidebar (Ops group) for viewing, editing, and
creating AI agent configurations:

Database (migration 026):
- ai_agents table: editable configs for each LLM agent (model, prompts,
  temperature, tokens, retries). source='system' for built-in,
  source='user' for custom. Seeds 3 system agents (Document Extractor,
  Event Classifier, Thesis Rewriter) using WHERE NOT EXISTS to never
  overwrite user edits across reinstalls.
- agent_performance_log table: per-invocation metrics (duration,
  confidence, retries, tokens, errors) linked to agent config.

API endpoints:
- GET/POST /api/agents — list and create agents
- GET/PUT/DELETE /api/agents/{id} — view, edit, delete (system agents
  can be edited but not deleted)
- GET /api/agents/{id}/performance — aggregated metrics (success rate,
  avg/p95 latency, confidence, token usage)
- GET /api/agents/{id}/performance/history — hourly time series

Frontend:
- AgentsPage with sidebar list + detail panel
- Agent detail: config display, system prompt viewer, performance
  dashboard with metrics cards and time-series chart
- Edit form: all config fields editable including system prompt,
  model, temperature, tokens, retries
- Create form: new user-defined agents with auto-slug generation
- System agents show blue badge, user agents show green badge
2026-04-17 01:24:35 +00:00
Celes Renata 86b549e5e1 fix: migrations preserve trend history across reinstalls
Migration 023 was deleting all but the latest trend_windows row per
entity before 024 could save them to trend_history. On reinstall,
this wiped the entire history every time.

Fixed by restructuring:
- 023 now creates trend_history FIRST and copies all trend_windows
  rows into it before deduplicating trend_windows down to latest-only.
  Uses NOT EXISTS to avoid duplicating rows on re-runs.
- 024 is now idempotent: ensures table/indexes exist and backfills
  from recommendations (last 7 days, 1 point per ticker/window/hour)
  to reconstruct approximate history even if trend_windows was sparse.

Both migrations are safe to re-run on existing databases.
2026-04-17 01:15:28 +00:00
Celes Renata 2360c501e4 feat: intraday hourly price bars via Polygon range endpoint
- New 'intraday_bars' endpoint in PolygonMarketAdapter: fetches hourly
  bars for today using range_bars URL with timespan=hour, sort=asc
- Scheduler expands intraday_bars global source into per-ticker jobs
  for all active companies (every 15 minutes via polling_interval)
- Migration 025 inserts the intraday source with 900s cadence
- Frontend price matching uses closest-timestamp instead of date-string
  matching, with 2h tolerance for intraday and 36h for daily windows
- Bumped market price fetch limit to 200 for intraday granularity
2026-04-17 01:13:24 +00:00
Celes Renata 7c589353f8 fix: blank company charts + competitor GUIDs instead of tickers
Trend charts blank:
- trend_windows uses upsert (1 row per ticker/window), so charts had
  at most 1 data point. Added trend_history table (migration 024) that
  appends every snapshot. New /api/trends/history endpoint serves the
  time series. Frontend now uses useTrendHistory for charts and
  useTrends for the latest summary card.

Competitor GUIDs:
- list_competitors query returned raw company_b_id UUIDs without
  joining companies table. Added LEFT JOIN with CASE to resolve the
  other company's ticker and legal_name. Updated Pydantic model to
  include enriched fields. Frontend fallback changed from truncated
  UUID to ticker/legal_name/Unknown.
2026-04-17 00:42:55 +00:00
Celes Renata 3a856cf6ff fix: reduce Ollama timeout from 300s to 240s (4 min) 2026-04-16 18:43:50 +00:00
Celes Renata 6bab199159 fix: trend_windows now upserts instead of accumulating (7.5GB→4MB), add competitive signal retention cleanup 2026-04-16 14:32:24 +00:00
Celes Renata 58a8726306 feat: add paper trading capital controls — API endpoint + UI with presets, fix status/metrics to read real state, fix migration duplicates 2026-04-16 14:06:30 +00:00
Celes Renata 540d54c3f7 feat: scale aggregation to 4 replicas across cluster nodes 2026-04-16 09:26:22 +00:00
Celes Renata 0b3ab4ed90 feat: add 11 new saved queries, fix window quoting and cross-table join in samples 2026-04-16 07:14:44 +00:00
Celes Renata f1e32e9186 fix: add round(double precision, integer) overload so ad-hoc queries work without ::numeric casts 2026-04-16 07:10:23 +00:00