Commit Graph

64 Commits

Author SHA1 Message Date
Celes Renata b478022ba3 fix: data quality query and suppression fallback in recommendation worker
- Fix _DATA_QUALITY_QUERY: remove nonexistent d.source_id/s.source_class,
  use d.source_type directly
- Fix LIMIT 1 applied after jsonb expansion by restructuring as CTE
- Fix fallback build_quality_context_from_summary returning empty
  source_types which always triggered LOW_SOURCE_DIVERSITY suppression
- Update test to reflect corrected fallback behavior
2026-04-14 06:57:46 +00:00
Celes Renata 4fbddc307a fix(extractor): fallback for any unrecognized impact_horizon value 2026-04-12 16:27:37 -07:00
Celes Renata 6ae8aa779e fix(extractor): add underscore variants to impact_horizon normalizer
Model returns long_term/short_term/medium_term instead of hyphenated versions
2026-04-12 16:08:25 -07:00
Celes Renata cd782d1552 fix(extractor): streaming with guardrails + catalyst_type normalization
- Switch Ollama calls from non-streaming to streaming with early termination
- Add loop detection, max token limit, and stall timeout guards
- Add catalyst_type alias normalizer to handle model hallucinations
- Add explicit enum values in extraction prompt for catalyst_type
- Add streaming config knobs to OllamaConfig
2026-04-12 15:28:20 -07:00
Celes Renata 527be42f82 phase 17: restore script — per-bucket size reporting, full DB row count verification 2026-04-12 14:53:19 -07:00
Celes Renata 85e0ef7580 phase 17: backup script now reports per-bucket sizes and file counts 2026-04-12 14:46:47 -07:00
Celes Renata dd13045ca6 phase 17: fix backup/restore scripts — use postgres:18-alpine for pg_dump version match 2026-04-12 14:41:08 -07:00
Celes Renata 3f5b4adcec phase 17: add backup/restore scripts — PostgreSQL + MinIO → NFS 2026-04-12 14:21:59 -07:00
Celes Renata 6e2f174b19 phase 17: disable qwen3.5 thinking mode (think:false) to reduce latency and improve structured output 2026-04-12 12:35:24 -07:00
Celes Renata 45f0c03639 phase 17: add request-level URL logging to OllamaClient for proxy debugging 2026-04-12 12:32:44 -07:00
Celes Renata fd35e12d5e phase 17: switch Ollama to external proxy at 10.1.1.12:2701 2026-04-12 11:37:23 -07:00
Celes Renata 80e0f0976f phase 17: switch back to qwen3.5:9b-fast (Ollama restarted, model available again) 2026-04-12 11:05:42 -07:00
Celes Renata a3e8009fa9 phase 17: revert to qwen3.5:9b (9b-fast was removed from Ollama), add retry script 2026-04-12 10:58:53 -07:00
Celes Renata 109a2485cf phase 17: increase Ollama timeout to 300s for qwen3.5:9b-fast 32k context 2026-04-12 10:32:13 -07:00
Celes Renata 7ee1d0f050 phase 17: switch to qwen3.5:9b-fast (32k context), add queue management scripts 2026-04-12 10:19:28 -07:00
Celes Renata 1993bfdf3e phase 17: add extraction output normalization — clamp scores to 0-1, map impact_horizon alternatives 2026-04-12 10:15:38 -07:00
Celes Renata 608ccc8b68 phase 17: revert to qwen3.5:9b, keep improved prompt style 2026-04-12 10:06:13 -07:00
Celes Renata 66ed38bf18 phase 17: switch to gemma4:e4b, rewrite prompts for fill-the-fields style with forced ticker inclusion 2026-04-12 10:05:31 -07:00
Celes Renata 2e42310f07 phase 17: fix SEC EDGAR 403 — use descriptive User-Agent with contact email per fair access policy 2026-04-12 09:50:29 -07:00
Celes Renata 311d76dc0b phase 17: enrich SEC EDGAR filings with URLs, titles, dedupe by accession number, skip XML fragments 2026-04-12 09:42:12 -07:00
Celes Renata 28b3361833 phase 17: remove embedded JSON schema from user prompt (4.7KB saved), Ollama format param handles it 2026-04-12 09:28:28 -07:00
Celes Renata 57d0fc7d33 phase 17: pass all tracked tickers to extractor, soften prompt for macro-to-company relevance 2026-04-12 09:18:08 -07:00
Celes Renata 59f89d03d2 phase 17: enrich short parsed articles with Polygon description/keywords from raw payload 2026-04-12 08:52:46 -07:00
Celes Renata cd32c3e3fe phase 17: increase parser→extractor text limit from 8k to 32k chars 2026-04-12 08:37:29 -07:00
Celes Renata ffcc66ae0b phase 17: sync standalone k8s/trino.yaml with Helm template (native S3, s3.region) 2026-04-12 08:24:04 -07:00
Celes Renata 34787ad825 phase 17: fix Trino hive catalog — use native S3 filesystem, remove defunct hive.s3 props 2026-04-12 08:18:18 -07:00
Celes Renata 999648d90b phase 17: add s3.region to Trino catalog config for MinIO (fixes AWS SDK region error) 2026-04-12 08:16:14 -07:00
Celes Renata 4f2f113cda phase 17: fix text[]/varchar[] type mismatch in coverage-gaps SQL 2026-04-12 04:15:00 -07:00
Celes Renata d16e15c885 phase 17: quote reserved word 'window' in all SQL queries across recommendation worker and query API 2026-04-12 03:45:51 -07:00
Celes Renata 181ed2b6cd phase 17: quote reserved word 'window' in aggregation SQL INSERT 2026-04-12 03:38:45 -07:00
Celes Renata 019eaa40d7 phase 17: fix datetime JSON serialization in aggregation worker market_context 2026-04-12 03:31:32 -07:00
Celes Renata 48bf4f7e7e phase 17: extractor fetches normalized text from MinIO when not in job payload 2026-04-12 03:24:10 -07:00
Celes Renata 012b973bb7 phase 17: wire extractor→aggregation→recommendation queue chain, add company_id_map to extractor 2026-04-12 03:16:27 -07:00
Celes Renata 226cc3ff44 phase 17: switch Ollama model to qwen3.5:9b (available on cluster) 2026-04-12 03:10:49 -07:00
Celes Renata e4a1d2d69a phase 17: fix HTML parser NoneType attrs crash in boilerplate stripping 2026-04-12 03:03:07 -07:00
Celes Renata 264b83ea56 phase 17: fix Polygon article_url and published_utc field mapping in metadata persistence 2026-04-12 02:58:30 -07:00
Celes Renata 0ac4493bd4 phase 17: fix parser URL lookup from DB and extractor text field name mismatch 2026-04-12 02:54:23 -07:00
Celes Renata 67cdb0b8c8 phase 17: fix ruff lint error in scheduler import order 2026-04-12 02:47:47 -07:00
Celes Renata f2b9d6c00a phase 17: fix scheduler config parsing, worker entry points, and seed data for Polygon sources 2026-04-12 02:45:37 -07:00
Celes Renata 1410d324b7 phase 17: add vertical slice tasks for live pipeline activation 2026-04-11 20:43:47 -07:00
Celes Renata 37d5f9b01c update steering docs and hooks for current project state 2026-04-11 20:41:57 -07:00
Celes Renata 99e17be282 phase 16: fix env var fallback - use || instead of ?? for empty string 2026-04-11 20:06:13 -07:00
Celes Renata 4d0c38bba7 phase 16: vitest + MSW frontend tests, CI integration 2026-04-11 19:28:38 -07:00
Celes Renata 5758a704ec phase 16: fix UUID serialization in symbol registry responses 2026-04-11 19:22:13 -07:00
Celes Renata 6f5b2231a2 phase 16: add registry/risk nginx proxies, add company form, network policies 2026-04-11 19:12:07 -07:00
Celes Renata 4cd8961db6 phase 16: add dashboard network policy, allow query-api from dashboard 2026-04-11 18:20:48 -07:00
Celes Renata cc7014e33d phase 16: fix superset - trino driver in venv, psycopg2 metadata db, core secrets 2026-04-11 17:37:39 -07:00
Celes Renata 5f87cbe464 phase 16: custom superset image with trino driver, fix security context 2026-04-11 17:18:17 -07:00
Celes Renata afa627322a phase 16: fix ruff lint - move imports to top of file 2026-04-11 16:48:50 -07:00
Celes Renata 59da3fe89e phase 16: nginx-unprivileged on 8080, helm dashboard deployment 2026-04-11 16:37:59 -07:00