fix: clean up utcnow deprecation warnings, fix 12 failing tests, add CI/CD pipeline manifests

- Replace all datetime.utcnow() with datetime.now(tz=timezone.utc) across 8 files
- Fix 12 failing tests to match current implementation behavior
- Fix pytest_plugins in non-top-level conftest (moved to root conftest.py)
- Auto-fix 189 lint issues (import sorting, unused imports)
- Add CI/CD pipeline infrastructure (ARC, ArgoCD, Kargo manifests)
- Add values-beta.yaml and values-paper.yaml for staged deployments
- Update GitHub Actions workflow to use self-hosted-gremlin runners
- Add integration-test job to CI pipeline

Result: 1596 passed, 0 failed, 0 warnings
This commit is contained in:
Celes Renata
2026-04-18 03:59:28 +00:00
parent 40227a4eb2
commit c85c0068a2
123 changed files with 7221 additions and 405 deletions
+104
View File
@@ -373,6 +373,110 @@ All services read configuration from environment variables with sensible default
---
## 11. Integration Tests
The integration test pipeline validates all API endpoints against a live Kubernetes sandbox with realistic seed data. It deploys ephemeral infrastructure (PostgreSQL, Redis, MinIO), seeds deterministic test data, deploys all API services, and runs the full test suite with profiling.
### Prerequisites
- `kubectl` configured with access to a Kubernetes cluster
- Docker images built and pushed to GHCR (or use `:latest`)
- `envsubst` available (usually part of `gettext` package)
- `GHCR_TOKEN` environment variable set for image pulls (optional if images are public)
### Running the Full Pipeline
```bash
# Run with latest images
bash infra/inttest/run_pipeline.sh
# Run with a specific image tag
bash infra/inttest/run_pipeline.sh --image-tag abc123
# Keep the sandbox running for debugging
bash infra/inttest/run_pipeline.sh --skip-teardown
# Custom namespace and results file
bash infra/inttest/run_pipeline.sh --namespace my-test --results-file results.json
```
### CLI Options
| Option | Default | Description |
|--------|---------|-------------|
| `--image-tag TAG` | `latest` | Docker image tag to deploy |
| `--namespace NAME` | `stonks-inttest-<timestamp>` | Kubernetes namespace name |
| `--skip-teardown` | `false` | Leave namespace running after tests |
| `--results-file PATH` | `inttest-results.json` | Path for JSON results output |
### Exit Codes
| Code | Meaning |
|------|---------|
| 0 | All tests passed |
| 1 | One or more test failures |
| 2 | Infrastructure setup failure |
### JSON Result Contract
The pipeline produces a JSON results file (`inttest-results.json` by default) with this structure:
```json
{
"run_id": "stonks-inttest-1705312800",
"image_tag": "abc123",
"started_at": "2025-01-15T12:00:00Z",
"completed_at": "2025-01-15T12:07:30Z",
"exit_code": 0,
"stages": {
"infra_deploy": {"duration_s": 45, "status": "ok"},
"seed_data": {"duration_s": 8, "status": "ok"},
"service_deploy": {"duration_s": 32, "status": "ok"},
"integration_tests": {"duration_s": 28, "status": "ok"},
"teardown": {"duration_s": 5, "status": "ok"}
},
"tests": {"total": 41, "passed": 41, "failed": 0, "errors": 0},
"profiling": {
"endpoints": {"/api/companies": {"p50_ms": 12, "p95_ms": 25, "p99_ms": 45}},
"slow_endpoints": []
}
}
```
### Running Tests Locally (Development)
For faster iteration during development, you can run individual test files against local services:
```bash
# Start local services first (query-api on 8000, registry on 8001, etc.)
# Then run specific test files:
.venv/bin/python -m pytest tests/integration/test_query_api.py -v --tb=short
.venv/bin/python -m pytest tests/integration/test_registry_api.py -v --tb=short
.venv/bin/python -m pytest tests/integration/test_frontend_data_deps.py -v --tb=short
# Run with profiling output:
.venv/bin/python -m pytest tests/integration/ -v --profiling-output=profiling.json
```
Set the service URLs via environment variables:
```bash
export QUERY_API_URL=http://localhost:8000
export REGISTRY_API_URL=http://localhost:8001
export RISK_API_URL=http://localhost:8002
export TRADING_API_URL=http://localhost:8003
```
### Future: CI/CD Pipeline
This integration test runner is designed as a standalone foundation. A future CI/CD pipeline spec will consume it as one stage in a larger pipeline that includes:
- Self-hosted builds on gremlin nodes (no GitHub Actions compute costs)
- Staged promotion: beta → paper → live
- Market-hours promotion blockers (9:3016:00 ET)
- Break-glass emergency deploy to production
- Per-stage enable/disable toggles
---
## Troubleshooting
### "Connection refused" to PostgreSQL/Redis/MinIO
+535
View File
@@ -0,0 +1,535 @@
# From Model Output to Trade: The Full Pipeline
This document traces the complete journey of data through Stonks Oracle — from the moment an Ollama model produces structured JSON, through signal scoring and aggregation, to the final trading decision.
---
## 1. Document Ingestion
Before the model ever sees a document, the ingestion layer fetches raw content from configured sources (news APIs, SEC filings, earnings transcripts, press releases). Each document lands in the `documents` table with a status, type, and `published_at` timestamp. A Redis queue (`stonks:queue:extraction`) feeds documents to the extractor service.
---
## 2. Prompting the Model
The extractor service (`services/extractor/client.py`) sends each document to a local Ollama instance via `POST /api/chat`.
### System prompt
A short, strict instruction set:
> You are a financial document analyst. Extract structured data as JSON. Return ONLY a single JSON object. No markdown fences, no explanation, no text before or after the JSON. Every field in the schema is required. Use "other" for catalyst_type if unsure. Keep evidence_spans short (under 20 words each). Keep key_facts to 3-5 items max.
### User prompt
Built dynamically per document (`services/extractor/prompts.py`). It includes:
- **Document type guidance** — tailored instructions for articles, filings, transcripts, and press releases. For example, filings get: *"Extract concrete financial figures, risk factors, and material events as stated."* Transcripts get: *"Distinguish between management forward-looking statements and reported results."*
- **Tracked ticker hints** — the list of 50 tracked tickers, with rules: if a ticker appears verbatim in the text, the model must include it; if a sector theme clearly affects a tracked company, include it; never invent tickers outside the list.
- **Field-by-field instructions** — what each output field means and its valid range.
- **Document text** — truncated to 8,000 characters to keep inference fast.
### Ollama call parameters
- `think=false` (speed over chain-of-thought)
- `num_predict=4096` (max output tokens)
- Optional `num_ctx` override for longer documents
- The JSON schema (generated from Pydantic models) is passed as the `format` parameter for structured output
---
## 3. Model Output: The JSON Contract
The model returns a single JSON object matching the `ExtractionResult` schema (`services/extractor/schemas.py`):
```json
{
"summary": "Apple reported record Q4 earnings driven by iPhone 16 demand.",
"companies": [
{
"ticker": "AAPL",
"company_name": "Apple Inc.",
"relevance": 0.95,
"sentiment": "positive",
"impact_score": 0.8,
"impact_horizon": "1d_7d",
"catalyst_type": "earnings",
"key_facts": [
"Revenue up 12% YoY to $94.9B",
"iPhone revenue grew 18%",
"Services hit all-time high"
],
"risks": [
"China market softness noted by management"
],
"evidence_spans": [
"record quarterly revenue of $94.9 billion",
"iPhone revenue grew 18 percent year over year"
]
}
],
"macro_themes": ["consumer_spending", "ai_capex"],
"novelty_score": 0.6,
"confidence": 0.85,
"extraction_warnings": []
}
```
### Field definitions
| Field | Type | Range | Purpose |
|---|---|---|---|
| `summary` | string | — | 1-3 sentence document summary |
| `companies[]` | array | — | Per-company intelligence (one entry per affected company) |
| `.ticker` | string | — | Stock ticker symbol |
| `.relevance` | float | 0-1 | How central this company is to the document |
| `.sentiment` | enum | positive / negative / neutral / mixed | Overall sentiment toward the company |
| `.impact_score` | float | 0-1 | Estimated magnitude of impact (0 = negligible, 1 = highly material) |
| `.impact_horizon` | string | intraday / 1d / 1d_7d / 1d_30d / 30d_90d / 90d_plus | When the impact is expected to manifest |
| `.catalyst_type` | enum | earnings / product / legal / macro / supply_chain / m_and_a / rating_change / other | Primary catalyst category |
| `.key_facts` | string[] | — | Facts explicitly stated in the document (no fabrication) |
| `.risks` | string[] | — | Risks explicitly mentioned |
| `.evidence_spans` | string[] | — | Short verbatim quotes supporting the analysis |
| `macro_themes` | string[] | — | Broad economic themes (rates, inflation, ai_capex, etc.) |
| `novelty_score` | float | 0-1 | How surprising the information is |
| `confidence` | float | 0-1 | Model's self-assessed extraction quality |
| `extraction_warnings` | string[] | — | Issues encountered (ambiguous_ticker, incomplete_text, etc.) |
---
## 4. JSON Repair and Validation
The raw model output goes through two stages before it's trusted.
### 4a. JSON repair (`services/extractor/client.py`)
Ollama's `format` constraint is unreliable with `think=false` on certain models (Ollama bug #14645). The extractor handles this:
1. Try `json.loads()` directly — if it parses, use it as-is.
2. Strip markdown fences (` ```json ... ``` `) if present.
3. Fall back to the `json-repair` library, which fixes trailing commas, unterminated strings, and control characters.
### 4b. Structural + semantic validation (`services/extractor/schemas.py`)
1. **Structural validation** — parse the JSON against the `ExtractionResult` Pydantic model. Missing required fields, wrong types, or out-of-range values fail here.
2. **Semantic validation** — cross-field consistency checks:
- Ticker format validation
- Evidence span length checks
- Catalyst type alias normalization (maps variants to canonical enum values)
- Impact horizon normalization
3. Returns a `ValidationReport` with the parsed result or a list of errors.
### 4c. Retry logic
If validation fails and the error is retryable (not an HTTP 4xx client error), the extractor retries up to `max_retries` times (default 2) with exponential backoff. Every attempt — raw output, validation result, error, duration — is preserved in the `ExtractionResponse.attempts` list for audit.
---
## 5. Persistence: Document Intelligence and Impact Records
A successful extraction produces two sets of database records.
### Document intelligence (`document_intelligence` table)
One row per document:
- `document_id`, `document_type`, `summary`, `companies` (JSONB), `macro_themes`
- `novelty_score`, `source_credibility`, `confidence`, `extraction_warnings`
- `validation_status` (valid/failed)
- `model` metadata: provider, model_name, prompt_version, schema_version
### Per-company impact records (`document_impact_records` table)
One row per company mentioned in the extraction:
- `ticker`, `company_name`, `relevance`, `sentiment`, `impact_score`, `impact_horizon`, `catalyst_type`
- `key_facts`, `risks`, `evidence_spans` (all JSONB)
- Links back to `document_intelligence` via `intelligence_id`
### Raw artifact storage (MinIO)
Full prompts and raw model responses are stored in MinIO buckets (`stonks-llm-prompts`, `stonks-llm-results`) keyed by `document_id`, so any extraction can be replayed or audited.
---
## 6. Signal Scoring: Turning Records into Weighted Signals
The aggregation engine (`services/aggregation/worker.py`) converts raw impact records into `WeightedSignal` objects. Each signal carries a composite weight that determines how much it influences the final trend.
### Weight components (`services/aggregation/scoring.py`)
The combined weight is:
```
combined = gate × recency × credibility × (1 + novelty_bonus) × market_context_multiplier
```
| Component | Formula | Purpose |
|---|---|---|
| **Confidence gate** | 0 if extraction confidence < 0.2, else 1 | Reject unreliable extractions entirely |
| **Recency decay** | `2^(-age_hours / half_life)`, min 0.01 | Exponential decay — newer documents matter more. Half-lives: intraday=2h, 1d=12h, 7d=72h, 30d=240h, 90d=720h |
| **Credibility** | `source_credibility ^ exponent`, clamped [0.1, 1.0] | Source quality weighting |
| **Novelty bonus** | `novelty_score × 0.25` | Novel information gets up to 25% boost |
| **Market context** | Volatility boost (up to +30%) + volume surge boost (+15%) | Fast-moving, high-volume markets amplify fresh signals |
Each `WeightedSignal` also carries:
- `sentiment_value`: +1.0 (positive), -1.0 (negative), 0.0 (neutral/mixed)
- `impact_score`: the extraction's impact magnitude
- `document_id`: for evidence tracing
---
## 7. Three Signal Layers
The aggregation engine merges signals from three independent layers. Each layer can be toggled on/off at runtime via the `risk_configs` table — no restart needed.
### Layer 1: Company-specific signals (always active)
Direct document intelligence about a company. This is the core layer — `document_impact_records` for the ticker, scored as described in §6.
### Layer 2: Macro signals (toggle: `macro_enabled`)
Global events that affect companies through exposure profiles.
**Flow:**
1. The macro service classifies global events (from news) using Ollama — extracting event type, severity, affected regions/sectors/commodities.
2. Each company has an **exposure profile** (`exposure_profiles` table): geographic revenue mix, supply chain regions, commodity dependencies, market position tier.
3. **Overlap scoring** computes how much a global event overlaps with a company's exposure (geographic, supply chain, commodity dimensions).
4. A **resilience modifier** based on market position tier (global leaders are more resilient than domestic companies) adjusts the score.
5. The final `macro_impact_score = base_score × overlap_factor × resilience_modifier`.
6. Events older than 48 hours get accelerated staleness decay.
Macro signals are converted to `WeightedSignal` objects with:
- `sentiment_value` mapped from `impact_direction` (positive → +1, negative → -1)
- `impact_score = macro_impact_score × macro_signal_weight` (default weight: 0.3)
- Recency decay from the global event's publication time
### Layer 3: Competitive signals (toggle: `competitive_enabled`)
Historical patterns and cross-company signal propagation.
**Flow:**
1. **Self-company pattern mining** (`services/aggregation/pattern_matcher.py`): For each catalyst type in the current impact records, query historical outcomes for this ticker. Lookback: 180 days for routine signals, 365 days for major decisions (1.3× weight multiplier). Produces `HistoricalPattern` objects with `bullish_pct`, `bearish_pct`, `avg_strength`, `pattern_confidence`.
2. **Cross-company propagation** (`services/aggregation/signal_propagation.py`): When company A has a catalyst, look up its competitors via the `competitor_relationships` table (46 relationships across 50 companies). For each competitor, query cross-company historical patterns. Signal strength = `avg_strength × relationship_strength × pattern_confidence × impact_score`. Direction = majority historical outcome (bullish or bearish).
3. Competitive signals are converted to `WeightedSignal` objects with:
- `impact_score = signal_strength × competitive_signal_weight` (default weight: 0.2)
- Recency decay from the pattern's most recent data point or the signal's `computed_at` time
### Merging
All three layers produce `WeightedSignal` objects with the same structure. The aggregation engine simply concatenates them into a single list before computing the trend summary. The relative influence of each layer is controlled by the `macro_signal_weight` (0.3) and `competitive_signal_weight` (0.2) multipliers applied to their impact scores.
---
## 8. Trend Summary Assembly
From the merged signal list, the aggregation engine computes a `TrendSummary` for each ticker × window combination (intraday, 1d, 7d, 30d, 90d).
### Weighted sentiment average
```
avg_sentiment = Σ(sentiment_value × combined_weight × impact_score) / Σ(combined_weight × impact_score)
```
### Trend direction
| Condition | Direction |
|---|---|
| `avg_sentiment ≥ 0.15` | **Bullish** |
| `avg_sentiment ≤ -0.15` | **Bearish** |
| Contradiction > 0.10 and \|avg_sentiment\| < 0.30 | **Mixed** |
| Otherwise | **Neutral** |
### Trend strength
`strength = min(|avg_sentiment|, 1.0)` — the absolute magnitude of the weighted sentiment, clamped to [0, 1].
### Contradiction score
Measures disagreement among signals:
```
contradiction = minority_side_weight / total_weight
```
Where minority side is whichever of positive or negative has less total weight. A score of 0 means full agreement; approaching 0.5 means equal-weight disagreement.
The system also runs multi-dimensional contradiction detection (`services/aggregation/contradiction.py`):
- **Sentiment disagreement** — the core positive-vs-negative split
- **Catalyst disagreement** — same catalyst type with opposing sentiment from different documents
### Confidence
Derived from four factors:
- **Unique source count** — more distinct documents = higher confidence (caps at 15 unique sources for 0.8 contribution)
- **Average extraction confidence** — from the model's self-assessed quality
- **Signal agreement** — fraction of signals pointing the same direction, dampened by sample size (log₂ scaling, saturates around 7 unique sources)
- **Contradiction penalty** — `contradiction_score × 0.4` subtracted
### Evidence ranking
Supporting and opposing documents are ranked by a composite score considering weight, impact, recency, and confidence — not just raw weight. The top 10 of each are stored for citation.
### Catalysts and risks
Dominant catalyst types are ranked by cumulative signal weight. Material risks are deduplicated and ordered by the weight of the signal that surfaced them.
### Persistence
The assembled `TrendSummary` is upserted into the `trend_windows` table (one row per entity × window, updated each cycle). A snapshot is also appended to `trend_history` for time-series charting. Evidence mappings go into `trend_evidence` with per-document rank scores and component breakdowns.
---
## 9. Trend Projections
After assembling the current trend, the engine computes a forward-looking projection (`services/aggregation/projection.py`):
- **Macro decay** — projects macro event impact forward with exponential decay based on estimated duration and severity
- **Momentum** — trend momentum from recent price action
- **Driving factors** — lists key macro events, competitive patterns, and market conditions
- **Divergence detection** — flags when the projection diverges from the current trend direction
Output: `TrendProjection` with `projected_direction`, `projected_strength`, `projected_confidence`, `projection_horizon`, `driving_factors`, and `diverges_from_current`. Projections with confidence below 0.3 are flagged as `low_confidence` and excluded from thesis generation.
---
## 10. Recommendation Generation
The recommendation service (`services/recommendation/worker.py`) turns trend summaries into actionable recommendations.
### Step 1: Data quality suppression (`services/recommendation/suppression.py`)
Before any eligibility check, the system evaluates the quality of the underlying data:
| Check | Threshold | Effect |
|---|---|---|
| Average extraction confidence | < 0.40 | Suppress |
| Evidence staleness | > 168 hours (7 days) | Suppress |
| Source type diversity | < 1 distinct type | Suppress |
| Extraction failure rate | > 50% | Suppress |
| Valid document count | < 2 | Suppress |
| Overall data quality score | < 0.30 | Suppress |
The data quality score is a weighted composite: 40% extraction confidence + 30% evidence freshness + 30% document coverage.
**Safety suppression** — two additional rules prevent trading on thin evidence from a single signal layer:
- **Macro-only suppression**: If the trend direction is driven solely by macro signals with zero company-specific evidence, the recommendation is forced to informational mode.
- **Pattern-only suppression**: Same rule for pattern/competitive signals with no company or macro support.
### Step 2: Eligibility evaluation (`services/recommendation/eligibility.py`)
Deterministic rules — no model involvement:
**Gate checks** (any failure → no recommendation):
- Confidence ≥ 0.35
- Trend strength ≥ 0.10
- Contradiction score ≤ 0.60
- Evidence count ≥ 2
- Direction ≠ neutral
**Action mapping:**
- Strong bullish (strength ≥ 0.25) → **BUY**
- Strong bearish (strength ≥ 0.25) → **SELL**
- Weak but directional + decent confidence (≥ 0.50) → **HOLD**
- Everything else → **WATCH**
**Mode escalation:**
- WATCH and HOLD → always **informational** (no trades)
- BUY/SELL with confidence ≥ 0.70, contradiction ≤ 0.25, evidence ≥ 5 → **live_eligible**
- BUY/SELL with confidence ≥ 0.50 → **paper_eligible**
- Below that → **informational**
### Step 3: Position sizing
Computed from signal quality:
```
raw_portfolio_pct = base (1%) + confidence × strength × range (up to 10%)
```
Adjusted by:
- Contradiction penalty (higher contradiction → smaller position)
- Evidence count penalty (< 3 docs → 50% reduction, < 5 docs → 75%)
- Max loss percentage scales similarly (base 0.3% up to 2%)
### Step 4: Thesis generation
Two layers:
1. **Deterministic thesis** — assembled from trend direction, strength, catalysts, risks, contradiction notes, projection info, and the recommended action. Always generated.
2. **Optional LLM rewrite** (`services/recommendation/thesis_llm.py`) — for trading-eligible recommendations only, the deterministic thesis is rewritten into analyst-quality prose via Ollama. This is cosmetic; the underlying decision is unchanged.
### Step 5: Risk classification
Based on contradiction score, confidence, evidence count, and mode:
- `low` — high confidence, low contradiction, strong evidence
- `moderate` — decent signals with some uncertainty
- `high` — notable contradiction or low evidence
- `very_high` — multiple risk factors present
The thesis is prefixed with the risk label: `[risk:moderate] AAPL shows a bullish trend...`
### Step 6: Persistence
- `recommendations` table — the full recommendation record
- `recommendation_evidence` table — per-document citations with weights and evidence types
- `risk_evaluations` table — the eligibility decision, risk checks, and full decision trace
---
## 11. Trading Engine Decision Loop
The trading engine (`services/trading/engine.py`) polls the `recommendations` table every 60 seconds for actionable recommendations (`action IN ('buy', 'sell')` and `mode IN ('paper_eligible', 'live_eligible')`).
### Pre-trade checks (in order, first failure short-circuits)
1. **Circuit breaker** — is the daily loss cap or single-position loss cap breached? If so, all trading halts.
2. **Trading window** — is the market open? Outside market hours, skip.
3. **Confidence gate** — does the recommendation meet the active risk tier's minimum confidence?
4. **Deduplication** — has this recommendation already been processed?
5. **Declining positions** — are there multiple open positions currently declining?
6. **Max open positions** — is the portfolio at capacity?
### Position sizing (`services/trading/position_sizer.py`)
Computes the dollar amount and share quantity:
- Confidence-based scaling (sample-size-dampened agreement scoring)
- Risk tier adjustment (conservative / moderate / aggressive)
- Portfolio heat check (sector concentration, correlation)
- Active pool available capital
- Absolute position cap
### Stop-loss and take-profit (`services/trading/stop_loss_manager.py`)
- Stop-loss = entry price (ATR × atr_multiplier)
- Take-profit = entry price + (ATR × atr_multiplier × reward_risk_ratio)
- Trailing stops activate for open positions
### Additional checks
- **Correlation-aware diversification** — reject positions that would push portfolio correlation above threshold
- **Earnings calendar awareness** — reduce size or skip if earnings are within 2 days
- **Gradual entry** — large positions (> $30) split into 3 tranches over time
- **Reserve pool** — profits from closed positions siphon into an emergency liquidity reserve
### Risk tier auto-adjustment (`services/trading/risk_tier_controller.py`)
Daily evaluation of Sharpe ratio, drawdown, and win rate. The engine auto-adjusts between conservative, moderate, and aggressive tiers. The new tier is persisted to `risk_configs` and takes effect on the next cycle.
### Output: Trading Decision
Every evaluation produces a `TradingDecision` record persisted for audit:
- `decision`: act or skip
- `skip_reason`: which check failed (if any)
- `computed_position_size`, `computed_share_quantity`
- `risk_tier_at_decision`, `portfolio_heat_at_decision`, `active_pool_at_decision`
- `circuit_breaker_status`, `correlation_check_result`, `sector_exposure_check_result`
- `earnings_proximity_flag`, `is_micro_trade`, `decision_trace`
If the decision is **act**, an order job is pushed to the Redis broker queue (`stonks:queue:broker`) with ticker, action, quantity, and order type.
---
## 12. The Complete Data Flow (Summary)
```
Document (article/filing/transcript)
Ollama extraction (JSON)
│ ├─ JSON repair (json-repair library)
│ └─ Pydantic validation + semantic checks
document_intelligence + document_impact_records (PostgreSQL)
│ └─ Raw prompts/responses → MinIO (audit)
├──────────────────────────────────────────────────┐
│ │
▼ ▼
Layer 1: Company signals Layer 2: Macro signals
(impact records → WeightedSignal) (global_events → exposure matching
→ macro_impact_records → WeightedSignal)
│ │
│ Layer 3: Competitive signals │
│ (pattern mining + propagation │
│ → competitive_signal_records │
│ → WeightedSignal) │
│ │ │
└───────────┬───────────────┘───────────────────────┘
Signal merging (concatenate all WeightedSignals)
Trend summary assembly
(weighted sentiment → direction, strength, confidence,
contradiction, evidence ranking, catalysts, risks)
├─→ trend_windows (PostgreSQL)
├─→ trend_history (time-series)
└─→ trend_evidence (per-document rankings)
Trend projection (forward-looking)
Data quality suppression
(extraction confidence, staleness, diversity,
macro-only / pattern-only safety)
Eligibility evaluation
(gate checks → action mapping → mode escalation → position sizing)
Thesis generation + risk classification
├─→ recommendations (PostgreSQL)
├─→ recommendation_evidence
└─→ risk_evaluations
Trading engine decision loop
(pre-trade checks → position sizing → stop-loss →
correlation → earnings → gradual entry)
├─→ trading_decisions (PostgreSQL, audit)
└─→ stonks:queue:broker (Redis, order execution)
```
---
## 13. Key Database Tables
| Table | Stage | Purpose |
|---|---|---|
| `documents` | Ingestion | Raw ingested content |
| `document_intelligence` | Extraction | Ollama extraction output |
| `document_impact_records` | Extraction | Per-company impact from a document |
| `global_events` | Macro | Classified macro/geopolitical events |
| `exposure_profiles` | Macro | Company exposure data (geography, supply chain, commodities) |
| `macro_impact_records` | Macro | Per-company macro impact scores |
| `competitor_relationships` | Competitive | Company relationship graph |
| `competitive_signal_records` | Competitive | Cross-company propagated signals |
| `trend_windows` | Aggregation | Current trend summaries (upserted each cycle) |
| `trend_history` | Aggregation | Time-series snapshots for charting |
| `trend_evidence` | Aggregation | Per-document evidence rankings |
| `trend_projections` | Projection | Forward-looking trend projections |
| `recommendations` | Recommendation | Trade recommendations |
| `recommendation_evidence` | Recommendation | Per-document citations |
| `risk_evaluations` | Recommendation | Eligibility decisions and risk checks |
| `risk_configs` | Runtime | Toggle switches and risk tier configuration |
| `trading_decisions` | Trading | Pre-trade evaluation audit trail |
| `positions` | Trading | Open positions |
| `orders` | Trading | Broker orders |
| `fills` | Trading | Order fills |
---
## 14. Audit Trail
Every stage preserves full context for reproducibility:
- **Extraction**: raw Ollama response, repair steps, validation errors, all retry attempts
- **Aggregation**: per-signal weight breakdowns (recency, credibility, novelty, market context), contradiction details by dimension
- **Recommendation**: deterministic thesis, evidence citations with weights, eligibility decision trace, risk evaluation
- **Trading**: every pre-trade check result, position sizing breakdown, risk tier at decision time, full decision trace
- **Execution**: order details, fills, P&L, performance metrics