# Design Document — Dual-Pipeline Signal Engine

## Overview

The dual-pipeline signal engine is a new service at `services/signal_engine/` that runs as an independent Kubernetes deployment alongside the existing aggregation → recommendation pipeline. It implements a concurrent dual-pipeline architecture where both a heuristic (deterministic scoring) and probabilistic (Bayesian inference) pipeline evaluate the same normalized inputs per ticker per evaluation tick, producing independent BUY/WATCH/SKIP verdicts. A delta analyzer compares the two verdicts, and an output formatter assembles a structured `SignalOutput` contract published to the existing `trading_decisions` Redis queue.

The engine introduces several new components — Input Normalizer, Signal Library (Fibonacci, MA Stack, RSI, Cup & Handle, Elliott Wave), Multi-Timeframe Engine, Hard Filter Engine, Exit Engine, Delta Analyzer, and Output Formatter — while reusing existing infrastructure: `compute_signal_weight`, `compute_bayesian_posterior`, `classify_regime`, `WeightedSignal`, `BayesianPosterior`, and `RegimeClassification` from `services/aggregation/`.

The service is toggled via `dual_pipeline_enabled` in the `risk_configs` table (default: false, fail-safe). When disabled, the existing pipeline operates unchanged. When enabled, the signal engine runs alongside the existing pipeline with support for shadow mode (dual-pipeline output persisted but not forwarded to trading).

### Design Rationale

- **Separate service, not inline extension**: The signal engine has a fundamentally different evaluation cadence (multi-timeframe technical signals) and data flow (OHLCV bars, not document intelligence). Embedding it in the aggregation worker would couple two distinct concerns.
- **Reuse existing math**: The Bayesian posterior, regime classification, and signal weighting functions are battle-tested. The probabilistic pipeline wraps them with regime-based priors and likelihood ratio accumulation rather than reimplementing.
- **Concurrent pipelines via asyncio.gather**: Both pipelines share the same `NormalizedInput` reference and run concurrently. If one fails, the other completes normally with the failed pipeline producing a SKIP verdict.
- **Signal clustering for correlation penalty**: The Bayesian pipeline groups signals into four clusters (momentum, structure, volatility, fundamentals) and applies exponential decay within each cluster to prevent likelihood ratio stacking inflation from correlated signals.

---

## Architecture

### High-Level Flow

```mermaid
graph TD
    A[Evaluation Tick<br/>Redis queue: signal_engine] --> B[Input Normalizer]
    B --> C[Hard Filter Engine]
    C -->|filtered out| D[SKIP verdict for both pipelines]
    C -->|passed| E[Signal Library]
    E --> F[Multi-Timeframe Engine]
    F --> G{asyncio.gather}
    G --> H[Heuristic Pipeline]
    G --> I[Probabilistic Pipeline]
    H --> J[Delta Analyzer]
    I --> J
    J --> K[Output Formatter]
    K --> L[SignalOutput]
    L --> M[Redis: trading_decisions queue]
    L --> N[PostgreSQL: signal_engine_outputs]

    subgraph Exit Path
        B --> O[Exit Engine]
        O --> K
    end
```

### Trigger Mechanism

The signal engine polls a new Redis queue `stonks:queue:signal_engine`. Evaluation ticks are enqueued by the scheduler service after aggregation completes for a ticker. The queue message contains `{"ticker": "AAPL", "triggered_at": "2024-01-15T10:00:00Z"}`.

### Integration Points

| Component | Integration | Direction |
|---|---|---|
| Scheduler | Enqueues ticks to `signal_engine` queue | Scheduler → Signal Engine |
| Market data tables | OHLCV bars, closing prices, returns | Signal Engine reads |
| `macro_impact_records` | Macro bias computation | Signal Engine reads |
| `trend_windows` | Fundamental/valuation context | Signal Engine reads |
| `risk_configs` | Feature flags, thresholds | Signal Engine reads |
| `classify_regime()` | Regime classification for priors | Signal Engine calls |
| `compute_signal_weight()` | Heuristic signal weighting | Signal Engine calls |
| `compute_bayesian_posterior()` | Bayesian accumulation | Signal Engine calls |
| Redis `trading_decisions` | SignalOutput publication | Signal Engine → Trading Engine |
| `signal_engine_outputs` table | Persistence for audit | Signal Engine writes |
| Redis rolling agreement | Delta analyzer metrics | Signal Engine writes |

---

## Components and Interfaces

### Module Structure

```
services/signal_engine/
├── __init__.py
├── main.py                  # Entry point: asyncio event loop, queue polling
├── worker.py                # Top-level orchestrator per evaluation tick
├── config.py                # SignalEngineConfig, loaded from risk_configs + env
├── models.py                # All Pydantic models (NormalizedInput, SignalResult, etc.)
├── normalizer.py            # Input Normalizer — fetches and assembles NormalizedInput
├── signals/
│   ├── __init__.py
│   ├── base.py              # SignalEvaluator protocol, SignalResult model
│   ├── fibonacci.py         # Fibonacci retracement evaluator
│   ├── ma_stack.py          # Moving average stack evaluator
│   ├── rsi.py               # RSI evaluator
│   ├── cup_handle.py        # Cup & Handle pattern detector
│   └── elliott_wave.py      # Elliott Wave detector
├── confluence.py            # Multi-Timeframe Confluence Engine
├── hard_filter.py           # Hard Filter Engine
├── heuristic.py             # Heuristic Pipeline (Pipeline A)
├── probabilistic.py         # Probabilistic Pipeline (Pipeline B)
├── correlation.py           # Signal cluster classification + correlation penalty
├── exit_engine.py           # Exit Engine — position-level exit management
├── delta.py                 # Delta Analyzer
├── formatter.py             # Output Formatter
└── persistence.py           # Database persistence for signal_engine_outputs
```

### Key Function Signatures

#### `main.py` — Entry Point

```python
async def main() -> None:
    """Start the signal engine worker loop.
    
    Connects to PostgreSQL and Redis, loads config from risk_configs,
    and polls the signal_engine queue indefinitely.
    """
```

#### `worker.py` — Orchestrator

```python
async def evaluate_tick(
    pool: asyncpg.Pool,
    redis: redis.asyncio.Redis,
    ticker: str,
    config: SignalEngineConfig,
) -> SignalOutput | None:
    """Run a full evaluation tick for a single ticker.
    
    1. Normalize inputs
    2. Evaluate exit conditions for open positions
    3. Run hard filters
    4. Evaluate signals across timeframes
    5. Run both pipelines concurrently
    6. Compute delta analysis
    7. Format and publish output
    
    Returns None if the ticker is hard-filtered or both pipelines fail.
    """
```

#### `normalizer.py` — Input Normalizer

```python
async def normalize_input(
    pool: asyncpg.Pool,
    ticker: str,
    config: SignalEngineConfig,
) -> NormalizedInput:
    """Fetch and assemble all data needed for a single evaluation tick.
    
    Sources:
    - OHLCV bars from market_data_bars (M30, H1, H4, D, W, M)
    - Fundamental metrics from trend_windows + companies
    - Macro context from macro_impact_records + global_events
    - Open position state from the trading engine's portfolio
    
    Missing data sources produce sentinel values (None/empty list)
    with a logged warning.
    """
```

#### `signals/base.py` — Signal Evaluator Protocol

```python
from typing import Protocol

class SignalEvaluator(Protocol):
    """Protocol for all signal evaluators in the Signal Library."""
    
    def evaluate(
        self,
        bars: list[OHLCVBar],
        timeframe: str,
    ) -> SignalResult | None:
        """Evaluate a signal on a single timeframe's bar data.
        
        Returns None when insufficient data is available.
        """
        ...
```

#### `confluence.py` — Multi-Timeframe Engine

```python
def compute_confluence(
    signal_results: dict[str, dict[str, SignalResult]],
    weights: dict[str, float],
) -> list[ConfluenceSignal]:
    """Compute weighted confluence scores across timeframes.
    
    Args:
        signal_results: {signal_type: {timeframe: SignalResult}}
        weights: {timeframe: weight} e.g. {"M30": 0.03, "D": 0.30, ...}
    
    Returns:
        List of ConfluenceSignal objects that pass the minimum
        confluence threshold (≥2 timeframes, ≥1 of D/W/M).
    """
```

#### `hard_filter.py` — Hard Filter Engine

```python
def evaluate_hard_filters(
    normalized: NormalizedInput,
    config: HardFilterConfig,
) -> HardFilterResult:
    """Evaluate pre-pipeline hard filters.
    
    Checks:
    - macro_bias == -1.0 → SKIP
    - valuation_score < threshold → SKIP
    - earnings_proximity_days <= threshold → SKIP
    
    Returns HardFilterResult with filtered=True/False and all triggered reasons.
    """
```

#### `heuristic.py` — Heuristic Pipeline

```python
def run_heuristic_pipeline(
    normalized: NormalizedInput,
    confluence_signals: list[ConfluenceSignal],
    config: HeuristicConfig,
) -> HeuristicResult:
    """Run the deterministic heuristic pipeline.
    
    Computes S_total = S_company + S_macro + S_competitive using
    existing compute_signal_weight() and weighted sentiment averaging.
    Produces BUY/WATCH/SKIP verdict based on confidence and score thresholds.
    """
```

#### `probabilistic.py` — Probabilistic Pipeline

```python
def run_probabilistic_pipeline(
    normalized: NormalizedInput,
    confluence_signals: list[ConfluenceSignal],
    regime: RegimeClassification,
    config: ProbabilisticConfig,
) -> ProbabilisticResult:
    """Run the Bayesian probabilistic pipeline.
    
    1. Initialize regime-based prior (bull=0.58, range=0.50, bear=0.42)
    2. Compute likelihood ratios per signal with correlation penalty
    3. Accumulate via log-odds: logit(P_post) = logit(P_prior) + Σ log(LR_i)
    4. Apply entropy gating
    5. Compute EV_R = P_up · E[win_R] - (1 - P_up) · 1.0
    6. Produce BUY/WATCH/SKIP verdict
    """
```

#### `correlation.py` — Signal Correlation Penalty

```python
class SignalCluster(str, Enum):
    MOMENTUM = "momentum"       # MA stack, RSI
    STRUCTURE = "structure"     # Fibonacci, Elliott Wave
    VOLATILITY = "volatility"   # ATR-based, Bollinger-derived
    FUNDAMENTALS = "fundamentals"  # valuation, earnings, macro

def classify_signal(signal_type: str) -> SignalCluster:
    """Map a signal type to its correlation cluster."""

def apply_correlation_penalty(
    likelihood_ratios: list[LikelihoodRatio],
) -> list[LikelihoodRatio]:
    """Apply within-cluster decay penalty to correlated signals.
    
    Within each cluster, signals are ranked by LR magnitude.
    The strongest contributes at full weight; subsequent signals
    contribute at 0.5^(n-1) decay.
    
    Cross-cluster signals are independent (no penalty).
    """
```

#### `exit_engine.py` — Exit Engine

```python
def evaluate_exits(
    positions: list[OpenPositionState],
    current_prices: dict[str, float],
    config: ExitConfig,
) -> list[ExitSignal]:
    """Evaluate exit conditions for all open positions.
    
    Checks: stop_loss hit, target_1 hit (EXIT_HALF), target_2 hit (EXIT_FULL),
    trailing stop hit (EXIT_FULL for remaining).
    
    Trailing stop activates after EXIT_HALF and ratchets upward only.
    """
```

#### `delta.py` — Delta Analyzer

```python
async def analyze_delta(
    heuristic: HeuristicResult,
    probabilistic: ProbabilisticResult,
    redis: redis.asyncio.Redis,
    ticker: str,
) -> DeltaResult:
    """Compare pipeline verdicts and track agreement metrics.
    
    Computes agreement flag, confidence delta, disagreement reasons.
    Updates rolling 100-evaluation agreement rate in Redis.
    Logs warning when agreement rate drops below 0.50.
    """
```

#### `formatter.py` — Output Formatter

```python
def format_output(
    ticker: str,
    price: float,
    heuristic: HeuristicResult,
    probabilistic: ProbabilisticResult,
    delta: DeltaResult,
    exit_signals: list[ExitSignal],
    config: SignalEngineConfig,
) -> SignalOutput:
    """Assemble the structured SignalOutput contract.
    
    Populates trade_plan based on verdict combination:
    - Both BUY → dual_confirmed, full position sizing
    - Probabilistic-only BUY → probabilistic_only, 50% position sizing
    - Heuristic-only BUY → standard position sizing
    - No BUY → no trade_plan (WATCH/SKIP persisted for analysis)
    """

def signal_output_to_recommendation(output: SignalOutput) -> Recommendation:
    """Map a SignalOutput to the existing Recommendation schema.
    
    Enables the trading engine to consume dual-pipeline outputs
    without modification to its core evaluate_recommendation logic.
    """
```

#### `persistence.py` — Database Persistence

```python
async def persist_signal_output(
    pool: asyncpg.Pool,
    output: SignalOutput,
) -> None:
    """Persist a SignalOutput to the signal_engine_outputs table.
    
    Logs and continues on database errors (persistence failure
    does not block signal emission to the trading queue).
    """
```

---

## Data Models

All new data models are Pydantic `BaseModel` subclasses defined in `services/signal_engine/models.py`. Existing models (`WeightedSignal`, `BayesianPosterior`, `RegimeClassification`, `TrendSummary`, `Recommendation`, `PositionSizing`) are imported from `services/aggregation/` and `services/shared/schemas.py`.

### OHLCVBar

```python
class OHLCVBar(BaseModel):
    """Single OHLCV bar for a timeframe."""
    timestamp: datetime
    open: float
    high: float
    low: float
    close: float
    volume: float
```

### NormalizedInput

```python
class NormalizedInput(BaseModel):
    """Unified input structure consumed by both pipelines."""
    ticker: str
    evaluated_at: datetime
    
    # Multi-timeframe OHLCV bars
    bars: dict[str, list[OHLCVBar]]  # {"M30": [...], "H1": [...], ...}
    
    # Fundamental metrics
    valuation_score: float | None = None  # [0.0, 1.0]
    earnings_proximity_days: int | None = None
    
    # Macro context
    macro_bias: float = 0.0  # [-1.0, 1.0]
    
    # Open position state (for exit engine)
    open_positions: list[OpenPositionState] = Field(default_factory=list)
    
    # Market data for regime classification
    closing_prices: list[float] = Field(default_factory=list)
    returns: list[float] = Field(default_factory=list)
    
    # Current price (latest close from shortest available timeframe)
    current_price: float | None = None
```

### OpenPositionState

```python
class OpenPositionState(BaseModel):
    """Snapshot of an open position for exit evaluation."""
    position_id: str
    ticker: str
    entry_price: float
    current_price: float
    stop_loss: float
    target_1: float
    target_2: float
    trailing_stop: float | None = None
    partial_exit_done: bool = False
    atr: float | None = None
```

### SignalResult

```python
class SignalDirection(str, Enum):
    BULLISH = "bullish"
    BEARISH = "bearish"
    NEUTRAL = "neutral"

class SignalResult(BaseModel):
    """Output from a single signal evaluator on a single timeframe."""
    signal_type: str          # e.g. "fibonacci", "ma_stack", "rsi"
    timeframe: str            # e.g. "D", "H4"
    strength: float = Field(ge=0.0, le=1.0)
    direction: SignalDirection
    confidence: float = Field(ge=0.0, le=1.0)
    metadata: dict = Field(default_factory=dict)  # signal-specific details
```

### ConfluenceSignal

```python
class ConfluenceSignal(BaseModel):
    """A signal that passed multi-timeframe confluence filtering."""
    signal_type: str
    direction: SignalDirection
    confluence_score: float  # weighted sum across timeframes
    active_timeframes: list[str]  # which timeframes triggered
    per_timeframe: dict[str, float]  # {timeframe: strength}
```

### Verdict

```python
class Verdict(str, Enum):
    BUY = "BUY"
    WATCH = "WATCH"
    SKIP = "SKIP"
```

### HeuristicResult

```python
class HeuristicResult(BaseModel):
    """Output from the heuristic (deterministic) pipeline."""
    verdict: Verdict
    confidence: float = Field(ge=0.0, le=1.0)
    s_total: float
    s_company: float
    s_macro: float
    s_competitive: float
    signal_weights: list[dict] = Field(default_factory=list)
    reasoning: list[str] = Field(default_factory=list)
```

### LikelihoodRatio

```python
class LikelihoodRatio(BaseModel):
    """A single signal's likelihood ratio for Bayesian updating."""
    signal_type: str
    cluster: str  # SignalCluster value
    lr: float     # P(sig|up) / P(sig|down)
    log_lr: float  # log(lr)
    penalized_log_lr: float  # after correlation penalty
    hit_rate: float
    strength: float
```

### ProbabilisticResult

```python
class ProbabilisticResult(BaseModel):
    """Output from the probabilistic (Bayesian) pipeline."""
    verdict: Verdict
    p_up: float = Field(ge=0.0, le=1.0)
    entropy: float = Field(ge=0.0, le=1.0)
    ev_r: float
    prior: float
    posterior: float
    likelihood_ratios: list[LikelihoodRatio] = Field(default_factory=list)
    regime: str
    reasoning: list[str] = Field(default_factory=list)
```

### DeltaResult

```python
class DeltaResult(BaseModel):
    """Output from the delta analyzer comparing both pipelines."""
    agreement: bool
    confidence_delta: float
    heuristic_verdict: str
    probabilistic_verdict: str
    disagreement_reasons: list[str] = Field(default_factory=list)
    rolling_agreement_rate: float | None = None
```

### ExitSignal

```python
class ExitType(str, Enum):
    EXIT_HALF = "EXIT_HALF"
    EXIT_FULL = "EXIT_FULL"

class ExitSignal(BaseModel):
    """An exit signal for an open position."""
    position_id: str
    ticker: str
    exit_type: ExitType
    reason: str  # "stop_hit", "target_1_hit", "target_2_hit", "trailing_stop_hit"
    price: float
```

### TradePlan

```python
class TradePlan(BaseModel):
    """Optional trade plan attached to a BUY signal."""
    entry_price: float
    stop_loss: float
    target_1: float
    target_2: float
    position_size_pct: float = Field(ge=0.0, le=1.0)
    max_loss_pct: float = Field(ge=0.0, le=1.0)
    dual_confirmed: bool = False
    probabilistic_only: bool = False
```

### SignalOutput

```python
class SignalOutput(BaseModel):
    """The structured output contract consumed by the trading engine and audit systems."""
    output_id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    ticker: str
    timestamp: datetime
    price: float
    
    # Heuristic pipeline results
    heuristic_verdict: str
    heuristic_confidence: float
    heuristic_s_total: float
    
    # Probabilistic pipeline results
    probabilistic_verdict: str
    probabilistic_p_up: float
    probabilistic_entropy: float
    probabilistic_ev_r: float
    
    # Delta analysis
    delta_agreement: bool
    delta_confidence_delta: float
    delta_reasons: list[str] = Field(default_factory=list)
    
    # Optional trade plan (populated when at least one pipeline says BUY)
    trade_plan: TradePlan | None = None
    
    # Exit signals for open positions
    exit_signals: list[ExitSignal] = Field(default_factory=list)
    
    # Full pipeline results for audit (stored as JSONB)
    heuristic_detail: dict = Field(default_factory=dict)
    probabilistic_detail: dict = Field(default_factory=dict)
    
    # Pipeline mode metadata
    pipeline_mode: str = "dual_pipeline"
    shadow_mode: bool = False
```

### SignalEngineConfig

```python
@dataclass
class SignalEngineConfig:
    """Configuration loaded from risk_configs + environment."""
    dual_pipeline_enabled: bool = False
    heuristic_pipeline_enabled: bool = True
    probabilistic_pipeline_enabled: bool = True
    shadow_mode: bool = False
    
    # Timeframe weights
    timeframe_weights: dict[str, float] = field(default_factory=lambda: {
        "M30": 0.03, "H1": 0.07, "H4": 0.15,
        "D": 0.30, "W": 0.30, "M": 0.15,
    })
    
    # Hard filter thresholds
    hard_filter_valuation_min: float = 0.3
    hard_filter_earnings_days: int = 5
    hard_filter_macro_bias_skip: float = -1.0
    
    # Heuristic verdict thresholds
    heuristic_buy_confidence: float = 0.70
    heuristic_buy_s_total: float = 1.2
    heuristic_buy_valuation_min: float = 0.5
    heuristic_watch_confidence: float = 0.55
    
    # Probabilistic verdict thresholds
    prob_buy_p_up: float = 0.60
    prob_buy_entropy_max: float = 0.90
    prob_buy_ev_r_min: float = 1.5
    prob_buy_valuation_min: float = 0.5
    prob_watch_p_up: float = 0.55
    prob_watch_entropy_max: float = 0.95
    prob_entropy_skip: float = 0.95
    
    # Regime priors
    regime_prior_bull: float = 0.58
    regime_prior_range: float = 0.50
    regime_prior_bear: float = 0.42
    
    # Exit engine
    trailing_stop_atr_multiplier: float = 2.0
    
    # Polling
    polling_interval_seconds: int = 30
```

### HardFilterConfig / HeuristicConfig / ProbabilisticConfig / ExitConfig

These are derived from `SignalEngineConfig` fields for cleaner function signatures — simple `@dataclass` wrappers over the relevant subset of config values.

---

### Database Migration (039)

```sql
-- Migration 039: Signal Engine Outputs
-- Creates the signal_engine_outputs table for persisting dual-pipeline evaluations.

CREATE TABLE IF NOT EXISTS signal_engine_outputs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    ticker TEXT NOT NULL,
    evaluated_at TIMESTAMPTZ NOT NULL,
    price NUMERIC NOT NULL,
    
    -- Heuristic pipeline
    heuristic_verdict TEXT NOT NULL,
    heuristic_confidence NUMERIC NOT NULL,
    heuristic_s_total NUMERIC NOT NULL,
    
    -- Probabilistic pipeline
    probabilistic_verdict TEXT NOT NULL,
    probabilistic_p_up NUMERIC NOT NULL,
    probabilistic_entropy NUMERIC NOT NULL,
    probabilistic_ev_r NUMERIC NOT NULL,
    
    -- Delta analysis
    delta_agreement BOOLEAN NOT NULL,
    delta_confidence_delta NUMERIC NOT NULL,
    delta_reasons JSONB NOT NULL DEFAULT '[]'::jsonb,
    
    -- Trade plan (null when no BUY verdict)
    trade_plan JSONB,
    
    -- Full output for audit
    full_output JSONB NOT NULL,
    
    -- Exit signals
    exit_signals JSONB NOT NULL DEFAULT '[]'::jsonb,
    
    -- Metadata
    pipeline_mode TEXT NOT NULL DEFAULT 'dual_pipeline',
    shadow_mode BOOLEAN NOT NULL DEFAULT FALSE,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Index for per-ticker time-range queries
CREATE INDEX IF NOT EXISTS idx_signal_engine_outputs_ticker_time
    ON signal_engine_outputs (ticker, evaluated_at);

-- Index for global time-range queries
CREATE INDEX IF NOT EXISTS idx_signal_engine_outputs_evaluated
    ON signal_engine_outputs (evaluated_at);

-- Index for filtering by verdict
CREATE INDEX IF NOT EXISTS idx_signal_engine_outputs_verdicts
    ON signal_engine_outputs (heuristic_verdict, probabilistic_verdict);
```

### Helm / Deployment Configuration

Add to `values.yaml` under `services:`:

```yaml
signalEngine:
  replicas: 1
  pipeline: true
  image: signal-engine
  command: "python -m services.signal_engine.main"
  tier: processing
  secrets: [stonks-core-secrets, stonks-market-secrets]
  resources:
    requests: { cpu: 100m, memory: 128Mi }
    limits: { cpu: 500m, memory: 256Mi }
```

Add to `redis_keys.py`:

```python
QUEUE_SIGNAL_ENGINE = "signal_engine"
```

The service uses the existing `stonks-config` ConfigMap and `stonks-core-secrets` for database/Redis credentials. No new ingress or network policy is needed — the signal engine is a queue-polling worker with no HTTP interface.

---