feat: signal math upgrade — probabilistic, regime-aware scoring pipeline
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/push/build-1 Pipeline was successful
ci/woodpecker/push/build-2 Pipeline was successful
ci/woodpecker/push/build-3 Pipeline was successful
ci/woodpecker/push/finalize Pipeline was successful
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.adapters.broker_adapter name:broker-adapter]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.aggregation.worker name:aggregation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.extractor.worker name:extractor]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.ingestion.worker name:ingestion]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.lake_publisher.worker name:lake-publisher]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.parser.worker name:parser]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.recommendation.worker name:recommendation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.scheduler.app name:scheduler]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.api.app:app --host 0.0.0.0 --port 8000 name:query-api]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.risk.app:app --host 0.0.0.0 --port 8000 name:risk]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000 name:symbol-registry]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.trading.app:app --host 0.0.0.0 --port 8000 name:trading-engine]) (push) Has been cancelled
Build and Push / build-dashboard (push) Has been cancelled
Build and Push / build-superset (push) Has been cancelled
Build and Push / integration-test (push) Has been cancelled
Build and Push / beta-gate (push) Has been cancelled

Implement full probabilistic signal processing pipeline gated behind
probabilistic_scoring_enabled feature flag in risk_configs:

- Bayesian log-likelihood accumulator with Beta posterior and entropy
- Regime detector (trend-following, panic, mean-reversion, uncertainty)
- Source accuracy tracker with per-source historical prediction accuracy
- Sigmoid confidence gate replacing binary gate
- Information gain surprise weighting for rare events
- Adaptive recency decay with event-specific half-lives
- Regime multiplier replacing market context multiplier
- Weighted disagreement entropy for contradiction detection
- Multiplicative macro exposure with conditional integration
- Graph-distance attenuated competitive signal propagation
- Exponentially weighted momentum with volatility scaling
- Expected value recommendation gate

All changes backward-compatible: flag=false preserves exact current behavior.
New outputs stored in existing JSONB columns (no schema changes except
source_accuracy table via migration 034).

Tests: 26 property-based tests (14 correctness properties), 99 unit tests,
1789 total tests passing with zero regressions.
This commit is contained in:
Celes Renata
2026-04-29 11:41:48 +00:00
parent 8c3c1aab43
commit 4e010bc048
24 changed files with 6058 additions and 60 deletions
+127
View File
@@ -0,0 +1,127 @@
"""Bayesian accumulator for probabilistic sentiment aggregation.
Accumulates weighted signals into a Bayesian posterior using
log-likelihood accumulation, Beta distribution parameters, and
Shannon entropy for mixed-signal detection.
Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 9.1, 9.7
"""
from __future__ import annotations
import math
from dataclasses import dataclass
from services.aggregation.scoring import WeightedSignal
@dataclass(frozen=True)
class BayesianPosterior:
"""Bayesian posterior state from signal accumulation."""
p_bull: float # σ(L_t), bullish probability [0, 1]
alpha: float # Beta distribution α parameter (≥ 1.0)
beta: float # Beta distribution β parameter (≥ 1.0)
log_likelihood: float # Raw log-likelihood accumulation L_t
bayesian_confidence: float # 1 - 4αβ/(α+β)², [0, 1]
entropy: float # Shannon entropy H, [0, 1]
signal_count: int # Number of signals processed
# Uninformative prior (no evidence)
PRIOR = BayesianPosterior(
p_bull=0.5,
alpha=1.0,
beta=1.0,
log_likelihood=0.0,
bayesian_confidence=0.0,
entropy=1.0,
signal_count=0,
)
def compute_entropy(p_bull: float) -> float:
"""Shannon entropy H = -p·log₂(p) - (1-p)·log₂(1-p).
Returns value in [0, 1]. Maximum at p=0.5, zero at p=0 or p=1.
Handles edge cases p≤0 and p≥1 by returning 0.0.
"""
if p_bull <= 0.0 or p_bull >= 1.0:
return 0.0
q = 1.0 - p_bull
return -(p_bull * math.log2(p_bull) + q * math.log2(q))
def compute_bayesian_posterior(
signals: list[WeightedSignal],
) -> BayesianPosterior:
"""Accumulate weighted signals into a Bayesian posterior.
Computes:
- Log-likelihood: L_t = Σ(w_i · s_i)
- Bullish probability: P_bull = σ(L_t)
- Beta posterior: α = 1 + W_bull, β = 1 + W_bear
- Bayesian confidence: C = 1 - 4αβ/(α+β)²
- Shannon entropy: H = -p·log₂(p) - (1-p)·log₂(1-p)
Returns PRIOR for empty signal lists.
Skips signals with NaN weight or sentiment.
"""
if not signals:
return PRIOR
log_likelihood = 0.0
w_bull = 0.0
w_bear = 0.0
count = 0
for sig in signals:
combined = sig.weight.combined
sentiment = sig.sentiment_value
# Skip signals with NaN weight or sentiment
if math.isnan(combined) or math.isnan(sentiment):
continue
log_likelihood += combined * sentiment
if sentiment > 0.0:
w_bull += combined
elif sentiment < 0.0:
w_bear += combined
count += 1
if count == 0:
return PRIOR
# P_bull via sigmoid: σ(L_t) = 1 / (1 + exp(-L_t))
# Guard against overflow in exp for very large |L_t|
if log_likelihood > 500.0:
p_bull = 1.0
elif log_likelihood < -500.0:
p_bull = 0.0
else:
p_bull = 1.0 / (1.0 + math.exp(-log_likelihood))
# Beta posterior parameters
alpha = 1.0 + w_bull
beta_param = 1.0 + w_bear
# Bayesian confidence: C = 1 - 4αβ/(α+β)²
ab_sum = alpha + beta_param
bayesian_confidence = 1.0 - (4.0 * alpha * beta_param) / (ab_sum * ab_sum)
# Clamp to [0, 1] to guard against floating-point rounding
bayesian_confidence = max(0.0, min(1.0, bayesian_confidence))
# Shannon entropy
entropy = compute_entropy(p_bull)
return BayesianPosterior(
p_bull=p_bull,
alpha=alpha,
beta=beta_param,
log_likelihood=log_likelihood,
bayesian_confidence=bayesian_confidence,
entropy=entropy,
signal_count=count,
)
+71 -2
View File
@@ -4,10 +4,11 @@ Analyses weighted signals to detect and represent disagreement explicitly,
rather than collapsing contradictory evidence into a single unsupported
conclusion.
Requirements: 6.4, 6.5
Requirements: 6.4, 6.5, 15.115.7
"""
from __future__ import annotations
import math
from dataclasses import dataclass
from services.aggregation.scoring import WeightedSignal
@@ -35,6 +36,9 @@ class ContradictionResult:
def detect_contradictions(
signals: list[WeightedSignal],
catalyst_entries: list[CatalystEntry] | None = None,
*,
probabilistic: bool = False,
w_threshold: float = 5.0,
) -> ContradictionResult:
"""Run contradiction detection across multiple dimensions.
@@ -42,6 +46,16 @@ def detect_contradictions(
1. Sentiment disagreement — the core positive-vs-negative split
2. Catalyst disagreement — same catalyst type with opposing sentiment
When ``probabilistic`` is True, the overall score uses weighted
disagreement entropy (Req 15.115.7) instead of the minority/majority
ratio. When False, the existing ratio formula is preserved exactly.
Args:
signals: Weighted signals to analyse.
catalyst_entries: Optional catalyst metadata for per-catalyst analysis.
probabilistic: Use entropy-based scoring when True.
w_threshold: Evidence mass threshold for entropy weighting (default 5.0).
Returns a ContradictionResult with an overall score and per-dimension
disagreement details.
"""
@@ -55,7 +69,10 @@ def detect_contradictions(
catalyst_details = _detect_catalyst_disagreement(signals, catalyst_entries)
details.extend(catalyst_details)
score = _compute_overall_score(signals)
if probabilistic:
score = _compute_entropy_score(signals, w_threshold)
else:
score = _compute_overall_score(signals)
return ContradictionResult(score=score, details=details)
@@ -82,6 +99,58 @@ def _compute_overall_score(signals: list[WeightedSignal]) -> float:
return round(minority / total, 4)
def _compute_entropy_score(
signals: list[WeightedSignal],
w_threshold: float = 5.0,
) -> float:
"""Weighted disagreement entropy — probabilistic contradiction score.
Computes Shannon entropy over the positive/negative weight distribution,
weighted by evidence mass relative to a configurable threshold.
Formula:
f_pos = W_pos / (W_pos + W_neg)
f_neg = 1 - f_pos
H = -f_pos·log₂(f_pos) - f_neg·log₂(f_neg) (in [0, 1])
score = H · min(1.0, (W_pos + W_neg) / W_threshold)
Returns 0.0 when only one direction exists (no disagreement).
Requirements: 15.115.7
"""
if not signals:
return 0.0
pos_weight = 0.0
neg_weight = 0.0
for sig in signals:
w = sig.weight.combined * sig.impact_score
if sig.sentiment_value > 0:
pos_weight += w
elif sig.sentiment_value < 0:
neg_weight += w
# No disagreement when only one direction exists (Req 15.5)
if pos_weight <= 0.0 or neg_weight <= 0.0:
return 0.0
total = pos_weight + neg_weight
# Compute weight fractions (Req 15.2)
f_pos = pos_weight / total
f_neg = neg_weight / total # = 1 - f_pos
# Shannon entropy H = -f_pos·log₂(f_pos) - f_neg·log₂(f_neg) (Req 15.3)
# Guard against log₂(0) — already handled by the early return above
h_contradiction = -f_pos * math.log2(f_pos) - f_neg * math.log2(f_neg)
# Weight by evidence mass (Req 15.4)
evidence_factor = min(1.0, total / w_threshold) if w_threshold > 0.0 else 1.0
score = h_contradiction * evidence_factor
return round(score, 4)
def _detect_sentiment_disagreement(
signals: list[WeightedSignal],
) -> DisagreementDetail | None:
+233 -16
View File
@@ -283,27 +283,82 @@ def _determine_impact_direction(
# ---------------------------------------------------------------------------
def _compute_multiplicative_exposure(
geo_overlap: float,
supply_overlap: float,
commodity_overlap: float,
sector_match: float,
) -> float:
"""Compute multiplicative compounding exposure.
Formula: 1 - Π_k(1 - w_k · O_k)
Multi-dimensional exposure compounds — a company exposed across
multiple dimensions receives higher impact than simple addition.
Returns a value in [0, ~0.724] (max when all overlaps are 1.0).
Requirements: 10.1, 10.4, 10.7
"""
product = (
(1.0 - GEO_WEIGHT * geo_overlap)
* (1.0 - SUPPLY_WEIGHT * supply_overlap)
* (1.0 - COMMODITY_WEIGHT * commodity_overlap)
* (1.0 - SECTOR_WEIGHT * sector_match)
)
return 1.0 - product
def _compute_linear_exposure(
geo_overlap: float,
supply_overlap: float,
commodity_overlap: float,
sector_match: float,
) -> float:
"""Compute linear weighted-sum exposure (original heuristic formula).
Formula: w_geo·O_geo + w_supply·O_supply + w_commodity·O_commodity + w_sector·O_sector
Returns a value in [0, 1].
"""
return (
GEO_WEIGHT * geo_overlap
+ SUPPLY_WEIGHT * supply_overlap
+ COMMODITY_WEIGHT * commodity_overlap
+ SECTOR_WEIGHT * sector_match
)
def compute_macro_impact(
event: GlobalEvent,
profile: ExposureProfileSchema,
*,
probabilistic: bool = False,
) -> MacroImpactRecord:
"""Compute the macro impact of a global event on a company.
Scoring formula:
When ``probabilistic=False`` (default), uses the linear weighted-sum:
raw_score = severity_weight * (
0.35 * geographic_overlap +
0.25 * supply_chain_overlap +
0.25 * commodity_overlap +
0.15 * sector_match
)
final_score = apply_resilience_modifier(raw_score, tier, is_international)
When ``probabilistic=True``, uses multiplicative compounding exposure:
raw_score = severity_weight * (1 - Π_k(1 - w_k · O_k))
In both modes, the resilience modifier is applied after the raw score.
Args:
event: The classified global event.
profile: The company's exposure profile.
probabilistic: Use multiplicative formula when True.
Returns:
A MacroImpactRecord with the computed score and metadata.
Requirements: 10.1, 10.2, 10.3, 10.4, 10.5, 10.6
"""
now = datetime.now(timezone.utc)
@@ -360,13 +415,16 @@ def compute_macro_impact(
# Severity weight
severity_weight = SEVERITY_WEIGHTS.get(event.severity, 0.25)
# Raw score
raw_score = severity_weight * (
GEO_WEIGHT * geo_overlap
+ SUPPLY_WEIGHT * supply_overlap
+ COMMODITY_WEIGHT * commodity_overlap
+ SECTOR_WEIGHT * sector_match
)
# Raw score: multiplicative or linear depending on mode
if probabilistic:
exposure = _compute_multiplicative_exposure(
geo_overlap, supply_overlap, commodity_overlap, sector_match,
)
else:
exposure = _compute_linear_exposure(
geo_overlap, supply_overlap, commodity_overlap, sector_match,
)
raw_score = severity_weight * exposure
# Determine if event is international (affects multiple regions)
is_international = len(event.affected_regions) > 1
@@ -406,19 +464,27 @@ def compute_macro_impact_with_sector(
event: GlobalEvent,
profile: ExposureProfileSchema,
company_sector: str = "",
*,
probabilistic: bool = False,
) -> MacroImpactRecord:
"""Compute macro impact with explicit sector matching.
Like compute_macro_impact but accepts a company_sector parameter
for proper sector_match computation.
When ``probabilistic=True``, uses multiplicative compounding exposure.
When ``probabilistic=False``, uses the original linear weighted sum.
Args:
event: The classified global event.
profile: The company's exposure profile.
company_sector: The company's GICS sector name.
probabilistic: Use multiplicative formula when True.
Returns:
A MacroImpactRecord with the computed score and metadata.
Requirements: 10.1, 10.2, 10.3, 10.4, 10.5, 10.6
"""
now = datetime.now(timezone.utc)
@@ -472,13 +538,16 @@ def compute_macro_impact_with_sector(
# Severity weight
severity_weight = SEVERITY_WEIGHTS.get(event.severity, 0.25)
# Raw score
raw_score = severity_weight * (
GEO_WEIGHT * geo_overlap
+ SUPPLY_WEIGHT * supply_overlap
+ COMMODITY_WEIGHT * commodity_overlap
+ SECTOR_WEIGHT * sector_match
)
# Raw score: multiplicative or linear depending on mode
if probabilistic:
exposure = _compute_multiplicative_exposure(
geo_overlap, supply_overlap, commodity_overlap, sector_match,
)
else:
exposure = _compute_linear_exposure(
geo_overlap, supply_overlap, commodity_overlap, sector_match,
)
raw_score = severity_weight * exposure
# International check
is_international = len(event.affected_regions) > 1
@@ -588,6 +657,154 @@ def _infer_commodities(sector: str, industry: str) -> list[str]:
return sector_commodities.get(sector, [])
# ---------------------------------------------------------------------------
# Conditional macro signal integration (Requirements: 11.111.5)
# ---------------------------------------------------------------------------
def compute_conditional_macro_modifier(
company_strength: float,
company_direction: str,
macro_impact: float,
macro_direction: str,
) -> float:
"""Compute the multiplicative macro modifier for conditional integration.
When both company and macro signals exist, macro acts as a modifier:
S_adjusted = S_company · clamp(1 + M_macro · sign_alignment, 0.5, 1.5)
sign_alignment is +1 when macro and company agree in direction,
-1 when they disagree.
Args:
company_strength: The company-level signal strength (absolute).
company_direction: Company trend direction (bullish/bearish/neutral/mixed).
macro_impact: Normalized macro impact score in [0, 1].
macro_direction: Macro impact direction (positive/negative/mixed/neutral).
Returns:
The multiplicative modifier in [0.5, 1.5].
Requirements: 11.1, 11.2
"""
# Determine sign alignment between company and macro directions
_DIRECTION_SIGN = {
"bullish": 1,
"positive": 1,
"bearish": -1,
"negative": -1,
}
company_sign = _DIRECTION_SIGN.get(company_direction, 0)
macro_sign = _DIRECTION_SIGN.get(macro_direction, 0)
if company_sign == 0 or macro_sign == 0:
# Neutral or mixed directions — no alignment signal
sign_alignment = 0.0
elif company_sign == macro_sign:
sign_alignment = 1.0
else:
sign_alignment = -1.0
raw_modifier = 1.0 + macro_impact * sign_alignment
return max(0.5, min(1.5, raw_modifier))
def integrate_macro_signals(
company_signals: list,
macro_signals: list,
company_direction: str,
macro_impacts: list,
ticker: str = "",
*,
probabilistic: bool = False,
macro_signal_weight: float = 0.3,
) -> tuple[list, float]:
"""Integrate macro signals with company signals.
When ``probabilistic=True``:
- Both exist: apply macro as multiplicative modifier on company signals
- Only macro: fall back to additive behavior with weight 0.3
- Only company: use modifier = 1.0 (no change)
When ``probabilistic=False``:
- Preserve current additive merge behavior (concatenate lists)
Args:
company_signals: WeightedSignal list from company layer.
macro_signals: WeightedSignal list from macro layer.
company_direction: Derived company trend direction string.
macro_impacts: List of MacroImpactRecord or similar with
macro_impact_score and impact_direction attributes.
ticker: Ticker symbol for logging.
probabilistic: Use conditional modifier when True.
macro_signal_weight: Weight for macro-only fallback (default 0.3).
Returns:
Tuple of (merged_signals, macro_modifier_applied).
macro_modifier_applied is 1.0 when no modifier was used.
Requirements: 11.1, 11.2, 11.3, 11.4, 11.5
"""
if not probabilistic:
# Heuristic mode: simple additive merge (current behavior)
merged = list(company_signals) + list(macro_signals)
return merged, 1.0
has_company = len(company_signals) > 0
has_macro = len(macro_signals) > 0
if has_company and has_macro:
# Compute average macro impact and dominant direction
avg_macro_impact = 0.0
direction_counts: dict[str, float] = {}
for mir in macro_impacts:
score = getattr(mir, "macro_impact_score", 0.0)
direction = getattr(mir, "impact_direction", "neutral")
avg_macro_impact += score
direction_counts[direction] = direction_counts.get(direction, 0.0) + score
if macro_impacts:
avg_macro_impact /= len(macro_impacts)
# Dominant macro direction by total impact weight
macro_direction = max(direction_counts, key=direction_counts.get) if direction_counts else "neutral"
modifier = compute_conditional_macro_modifier(
company_strength=0.0, # not used in current formula
company_direction=company_direction,
macro_impact=avg_macro_impact,
macro_direction=macro_direction,
)
logger.info(
"Macro modifier for %s: %.4f (avg_impact=%.4f, macro_dir=%s, company_dir=%s)",
ticker, modifier, avg_macro_impact, macro_direction, company_direction,
)
# Apply modifier to company signals by scaling their impact scores
# We create modified copies rather than mutating originals
from copy import copy
modified_signals = []
for sig in company_signals:
new_sig = copy(sig)
new_sig.impact_score = sig.impact_score * modifier
modified_signals.append(new_sig)
return modified_signals, modifier
if has_macro and not has_company:
# Macro-only fallback: additive behavior with weight 0.3 (Req 11.3)
logger.info(
"Macro-only fallback for %s: using additive merge with weight %.2f",
ticker, macro_signal_weight,
)
return list(macro_signals), 1.0
# Company-only: no modification (Req 11.4)
logger.info("Company-only signals for %s: macro modifier=1.0", ticker)
return list(company_signals), 1.0
# ---------------------------------------------------------------------------
# PostgreSQL persistence
# ---------------------------------------------------------------------------
+82 -1
View File
@@ -4,7 +4,7 @@ Computes TrendProjection objects by combining current trend momentum,
macro signal decay trajectories, and upcoming catalyst outlook.
Projections are persisted alongside trend_window records.
Requirements: 12.1, 12.2, 12.3, 12.4, 12.5, 12.9
Requirements: 12.1, 12.2, 12.3, 12.4, 12.5, 12.9, 13.1, 13.2, 13.3, 13.4, 13.5, 13.6
"""
from __future__ import annotations
@@ -126,6 +126,87 @@ def _direction_sign(direction: str) -> float:
return 0.0
# ---------------------------------------------------------------------------
# Exponentially weighted momentum (Requirements: 13.113.6)
# ---------------------------------------------------------------------------
def compute_ew_momentum(
strength_changes: list[float],
lambda_decay: float = 0.7,
) -> float:
"""Compute exponentially weighted momentum from historical strength changes.
Formula: M_t = Σ_{k=0}^{K-1} λ^k · ΔS_{t-k}
Normalized by geometric series sum Σ λ^k to produce value in [-1, 1].
When fewer than 2 historical cycles are available, returns 0.0
(caller should fall back to heuristic).
Args:
strength_changes: List of signed strength changes ΔS, most recent first.
Each value represents the change in signed trend strength from one
cycle to the next. Positive = strengthening bullish / weakening bearish.
lambda_decay: Decay factor λ (default 0.7). Must be in (0, 1).
Returns:
Normalized momentum in [-1, 1]. Returns 0.0 for empty or single-element lists.
Requirements: 13.1, 13.2, 13.3, 13.6
"""
if len(strength_changes) < 2:
return 0.0
# Use up to K=10 most recent changes, filtering out NaN values
k_max = min(len(strength_changes), 10)
changes = strength_changes[:k_max]
weighted_sum = 0.0
weight_sum = 0.0
for k, delta_s in enumerate(changes):
if math.isnan(delta_s):
continue
w = lambda_decay ** k
weighted_sum += w * delta_s
weight_sum += w
if weight_sum == 0.0:
return 0.0
normalized = weighted_sum / weight_sum
# Guard against NaN propagation
if math.isnan(normalized) or math.isinf(normalized):
return 0.0
return max(-1.0, min(1.0, normalized))
def compute_volatility_scaled_momentum(
momentum: float,
sigma_20: float,
) -> float:
"""Compute volatility-scaled momentum.
Formula: M_adj = M_t / max(σ_20, 0.01), clamped to [-2.0, 2.0].
Normalizes momentum relative to the ticker's typical price movement.
Args:
momentum: Raw or EW momentum value.
sigma_20: 20-day return standard deviation.
Returns:
Volatility-scaled momentum in [-2.0, 2.0].
Requirements: 13.4, 13.5
"""
denominator = max(sigma_20, 0.01)
scaled = momentum / denominator
# Guard against NaN propagation
if math.isnan(scaled) or math.isinf(scaled):
return 0.0
return max(-2.0, min(2.0, scaled))
# ---------------------------------------------------------------------------
# Macro signal decay projection
# ---------------------------------------------------------------------------
+170
View File
@@ -0,0 +1,170 @@
"""Regime detector for market regime classification.
Classifies the current market regime for each ticker based on
EMA trend indicators and volatility ratios. Adjusts scoring
thresholds and contradiction penalties per regime.
Requirements: 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.9
"""
from __future__ import annotations
import math
import statistics
from dataclasses import dataclass
from enum import Enum
class MarketRegime(str, Enum):
"""Market regime classification categories."""
TREND_FOLLOWING = "trend_following"
PANIC = "panic"
MEAN_REVERSION = "mean_reversion"
UNCERTAINTY = "uncertainty"
@dataclass(frozen=True)
class RegimeClassification:
"""Result of regime detection for a ticker."""
regime: MarketRegime
trend_indicator: float # R = sign(EMA_20 - EMA_100)
volatility_ratio: float # V_r = σ_20 / σ_100
bullish_threshold: float # Adjusted ±threshold for direction
bearish_threshold: float
contradiction_penalty_multiplier: float # 0.4 default, 0.6 for uncertainty
@dataclass(frozen=True)
class RegimeConfig:
"""Configuration parameters for regime detection."""
ema_short_period: int = 20
ema_long_period: int = 100
vol_short_period: int = 20
vol_long_period: int = 100
panic_vol_ratio: float = 1.5
trend_vol_ratio: float = 1.2
mean_reversion_vol_ratio: float = 1.0
default_threshold: float = 0.15
panic_threshold: float = 0.10
mean_reversion_threshold: float = 0.20
uncertainty_contradiction_multiplier: float = 0.6
# Default uncertainty classification used when data is insufficient
_DEFAULT_UNCERTAINTY = RegimeClassification(
regime=MarketRegime.UNCERTAINTY,
trend_indicator=0.0,
volatility_ratio=1.0,
bullish_threshold=0.15,
bearish_threshold=-0.15,
contradiction_penalty_multiplier=0.6,
)
def compute_ema(values: list[float], period: int) -> float:
"""Compute exponential moving average over the last ``period`` values.
Uses the standard EMA formula with multiplier = 2 / (period + 1).
Iterates through the values, seeding the EMA with the first value.
Raises ``ValueError`` when *values* is empty or *period* < 1.
"""
if not values or period < 1:
raise ValueError("values must be non-empty and period must be >= 1")
# Use only the last `period` values (or all if fewer)
data = values[-period:] if len(values) >= period else values
multiplier = 2.0 / (period + 1)
ema = data[0]
for value in data[1:]:
ema = (value - ema) * multiplier + ema
return ema
def _sign(x: float) -> float:
"""Return -1.0, 0.0, or 1.0 for the sign of *x*."""
if x > 0.0:
return 1.0
if x < 0.0:
return -1.0
return 0.0
def classify_regime(
closing_prices: list[float],
returns: list[float],
config: RegimeConfig = RegimeConfig(),
) -> RegimeClassification:
"""Classify market regime from price and return history.
Requires at least ``config.ema_long_period`` days of price history
for EMA_100. Falls back to UNCERTAINTY when data is insufficient
or standard deviations are zero.
Requirements: 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.9
"""
# Insufficient price data → uncertainty
if len(closing_prices) < config.ema_long_period:
return _DEFAULT_UNCERTAINTY
# Insufficient return data → uncertainty
if len(returns) < config.vol_long_period:
return _DEFAULT_UNCERTAINTY
# --- Trend indicator: R = sign(EMA_short - EMA_long) ---
ema_short = compute_ema(closing_prices, config.ema_short_period)
ema_long = compute_ema(closing_prices, config.ema_long_period)
trend_indicator = _sign(ema_short - ema_long)
# --- Volatility ratio: V_r = σ_short / σ_long ---
short_returns = returns[-config.vol_short_period:]
long_returns = returns[-config.vol_long_period:]
# Guard against zero or near-zero standard deviations
if len(short_returns) < 2 or len(long_returns) < 2:
return _DEFAULT_UNCERTAINTY
sigma_short = statistics.stdev(short_returns)
sigma_long = statistics.stdev(long_returns)
if sigma_long == 0.0 or sigma_short == 0.0:
return _DEFAULT_UNCERTAINTY
if math.isnan(sigma_short) or math.isnan(sigma_long):
return _DEFAULT_UNCERTAINTY
volatility_ratio = sigma_short / sigma_long
# --- Classification rules (Req 7.3) ---
# Panic takes priority: V_r > 1.5
if volatility_ratio > config.panic_vol_ratio:
regime = MarketRegime.PANIC
threshold = config.panic_threshold # ±0.10
contradiction_mult = 0.4
# Trend-following: R ≠ 0 AND V_r < 1.2
elif trend_indicator != 0.0 and volatility_ratio < config.trend_vol_ratio:
regime = MarketRegime.TREND_FOLLOWING
threshold = config.default_threshold # ±0.15
contradiction_mult = 0.4
# Mean-reversion: R = 0 AND V_r < 1.0
elif trend_indicator == 0.0 and volatility_ratio < config.mean_reversion_vol_ratio:
regime = MarketRegime.MEAN_REVERSION
threshold = config.mean_reversion_threshold # ±0.20
contradiction_mult = 0.4
# Uncertainty: all other cases
else:
regime = MarketRegime.UNCERTAINTY
threshold = config.default_threshold # ±0.15
contradiction_mult = config.uncertainty_contradiction_multiplier # 0.6
return RegimeClassification(
regime=regime,
trend_indicator=trend_indicator,
volatility_ratio=volatility_ratio,
bullish_threshold=threshold,
bearish_threshold=-threshold,
contradiction_penalty_multiplier=contradiction_mult,
)
+322 -16
View File
@@ -4,7 +4,7 @@ integration for aggregation.
Provides scoring functions used by the aggregation engine to weight
document intelligence signals when computing trend summaries.
Requirements: 6.1, 6.2, 6.5
Requirements: 2.12.6, 3.13.5, 4.24.3, 5.15.7, 6.16.5, 16.416.5
"""
from __future__ import annotations
@@ -14,6 +14,24 @@ from datetime import datetime, timezone
from services.shared.schemas import MarketContext
# ---------------------------------------------------------------------------
# Event type base rates for information gain computation (Req 3.1)
# ---------------------------------------------------------------------------
EVENT_TYPE_BASE_RATES: dict[str, float] = {
"earnings": 0.25,
"product_launch": 0.10,
"regulatory": 0.08,
"legal": 0.05,
"m_and_a": 0.03,
"management_change": 0.06,
"partnership": 0.12,
"market_expansion": 0.09,
"restructuring": 0.04,
"dividend": 0.15,
}
DEFAULT_BASE_RATE = 0.1
@dataclass(frozen=True)
class ScoringConfig:
@@ -62,6 +80,37 @@ class ScoringConfig:
volume_surge_threshold_pct: float = 50.0
volume_surge_boost: float = 0.15
# --- Probabilistic scoring parameters ---
# Toggle: when True, use probabilistic formulas (sigmoid gate,
# adaptive decay, info gain, regime multiplier, source accuracy).
# When False, preserve exact current heuristic behaviour.
probabilistic: bool = False
# Sigmoid gate parameters — smooth replacement for binary confidence gate.
# Gate value: σ(k·(x - midpoint)) where k = steepness.
sigmoid_steepness: float = 5.0
sigmoid_midpoint: float = 0.5
# Information gain parameters — surprise weighting for rare events.
# r = 1 + λ·(-log₂ P(event_type)), clamped to info_gain_max.
info_gain_lambda: float = 0.3
info_gain_max: float = 3.0
default_base_rate: float = 0.1
# Adaptive decay parameters — β scaling factors for event-specific
# half-life adjustment: τ_i = τ_base · (1+β_impact)·(1+β_surprise)·(1+β_market).
adaptive_decay_impact_scale: float = 1.0
adaptive_decay_surprise_scale: float = 1.0
adaptive_decay_market_scale: float = 0.5
# Regime multiplier parameters — replaces market context multiplier.
# M_regime = 1 + regime_return_weight·|z_r| + regime_volume_weight·|z_v|,
# clamped to [1.0, regime_multiplier_max].
regime_return_weight: float = 0.15
regime_volume_weight: float = 0.10
regime_multiplier_max: float = 2.5
# Singleton default config
DEFAULT_CONFIG = ScoringConfig()
@@ -77,6 +126,8 @@ def recency_weight(
reference_time: datetime,
window: str,
config: ScoringConfig = DEFAULT_CONFIG,
*,
half_life_override: float | None = None,
) -> float:
"""Compute an exponential recency decay weight for a document.
@@ -87,6 +138,8 @@ def recency_weight(
reference_time: The "now" anchor for the aggregation window (tz-aware).
window: One of the TrendWindow values (e.g. "7d").
config: Scoring parameters.
half_life_override: If provided, use this half-life instead of the
window-based default (used for adaptive decay).
Returns:
A weight in [config.min_recency_weight, 1.0].
@@ -102,7 +155,7 @@ def recency_weight(
return 1.0
age_hours = age_seconds / 3600.0
half_life = config.half_life_hours.get(window, 72.0)
half_life = half_life_override if half_life_override is not None else config.half_life_hours.get(window, 72.0)
weight = math.pow(2.0, -age_hours / half_life)
return max(weight, config.min_recency_weight)
@@ -170,6 +223,188 @@ def market_context_multiplier(
return 1.0 + boost
# ---------------------------------------------------------------------------
# Sigmoid confidence gate (Req 2.12.6)
# ---------------------------------------------------------------------------
def sigmoid_gate(
x: float,
steepness: float = 5.0,
midpoint: float = 0.5,
) -> float:
"""Smooth sigmoid confidence gate: σ(k·(x - midpoint)).
Replaces the binary 0/1 confidence gate in probabilistic mode.
Returns a value in (0, 1) — higher confidence produces higher gate.
Args:
x: Extraction confidence value, typically in [0, 1].
steepness: Steepness parameter k (default 5.0).
midpoint: Midpoint of the sigmoid transition (default 0.5).
Returns:
Gate value in (0, 1).
"""
z = steepness * (x - midpoint)
# Guard against overflow in exp for very negative z
if z < -500.0:
return 0.0
if z > 500.0:
return 1.0
return 1.0 / (1.0 + math.exp(-z))
# ---------------------------------------------------------------------------
# Information gain surprise weighting (Req 3.13.5)
# ---------------------------------------------------------------------------
def compute_info_gain(
event_type: str | None,
lambda_param: float = 0.3,
max_gain: float = 3.0,
default_base_rate: float = 0.1,
) -> float:
"""Compute information gain factor for an event type.
Formula: r = 1 + λ·(-log₂ P(event_type)), clamped to [1.0, max_gain].
Rarer events produce higher surprise weight. Unknown event types
use the default base rate.
Args:
event_type: Event type string (e.g. "earnings", "m_and_a").
lambda_param: Scaling parameter λ (default 0.3).
max_gain: Maximum clamp for the info gain factor (default 3.0).
default_base_rate: Fallback base rate for unknown event types.
Returns:
Information gain factor r in [1.0, max_gain].
"""
if event_type is None:
return 1.0
base_rate = EVENT_TYPE_BASE_RATES.get(event_type, default_base_rate)
# Guard against log₂(0) — base rates must be > 0
if base_rate <= 0.0:
base_rate = default_base_rate
if base_rate <= 0.0:
return 1.0
surprise = -math.log2(base_rate)
r = 1.0 + lambda_param * surprise
return min(max(r, 1.0), max_gain)
# ---------------------------------------------------------------------------
# Adaptive recency decay (Req 5.15.7)
# ---------------------------------------------------------------------------
def compute_adaptive_half_life(
base_half_life: float,
impact_score: float,
info_gain_factor: float,
market_multiplier: float,
config: ScoringConfig,
) -> float:
"""Compute adaptive half-life for event-specific recency decay.
Formula: τ_i = τ_base · (1 + β_impact) · (1 + β_surprise) · (1 + β_market)
The adaptive half-life is always >= base_half_life (decay is never faster).
Args:
base_half_life: Fixed half-life for the window (hours).
impact_score: Signal impact score in [0, 1].
info_gain_factor: Information gain factor r in [1.0, 3.0].
market_multiplier: Market context/regime multiplier in [1.0, ~2.5].
config: Scoring config with adaptive decay scale parameters.
Returns:
Adaptive half-life in hours, >= base_half_life.
"""
# β_impact: impact_score scaled linearly 0→0, 1→adaptive_decay_impact_scale
beta_impact = impact_score * config.adaptive_decay_impact_scale
# β_surprise: info_gain_factor scaled linearly r=1→0, r=3→adaptive_decay_surprise_scale
beta_surprise = ((info_gain_factor - 1.0) / 2.0) * config.adaptive_decay_surprise_scale
# β_market: market_multiplier scaled linearly 1.0→0, 1.45→adaptive_decay_market_scale
if market_multiplier > 1.0:
beta_market = ((market_multiplier - 1.0) / 0.45) * config.adaptive_decay_market_scale
else:
beta_market = 0.0
tau = base_half_life * (1.0 + beta_impact) * (1.0 + beta_surprise) * (1.0 + beta_market)
# Ensure adaptive half-life is never less than base (Property 5)
return max(tau, base_half_life)
# ---------------------------------------------------------------------------
# Regime multiplier (Req 6.16.5)
# ---------------------------------------------------------------------------
def compute_regime_multiplier(
returns: list[float] | None,
volumes: list[float] | None,
config: ScoringConfig = DEFAULT_CONFIG,
) -> float:
"""Compute regime-aware multiplier from return and volume z-scores.
Formula: M_regime = 1 + 0.15·|z_r| + 0.10·|z_v|, clamped to [1.0, max].
Args:
returns: List of recent daily returns (at least 20 values for z-score).
volumes: List of recent daily volumes (at least 20 values for z-score).
config: Scoring config with regime multiplier parameters.
Returns:
Regime multiplier in [1.0, config.regime_multiplier_max].
"""
if not returns or len(returns) < 2:
return 1.0
# Filter out NaN values from returns
clean_returns = [r for r in returns if not math.isnan(r)]
if len(clean_returns) < 2:
return 1.0
# Return z-score: z_r = (r_t - μ_20) / σ_20
r_window = clean_returns[-20:] if len(clean_returns) >= 20 else clean_returns
r_t = clean_returns[-1]
mu_r = sum(r_window) / len(r_window)
var_r = sum((x - mu_r) ** 2 for x in r_window) / len(r_window)
sigma_r = math.sqrt(var_r)
z_r = 0.0
if sigma_r > 0.0:
z_r = (r_t - mu_r) / sigma_r
# Volume z-score: z_v = (log(V_t) - μ_V) / σ_V
z_v = 0.0
if volumes and len(volumes) >= 2:
clean_volumes = [v for v in volumes if not math.isnan(v)]
if len(clean_volumes) >= 2:
v_window = clean_volumes[-20:] if len(clean_volumes) >= 20 else clean_volumes
# Use log-volumes, guard against zero/negative volumes
log_vols = [math.log(max(v, 1.0)) for v in v_window]
log_v_t = math.log(max(clean_volumes[-1], 1.0))
mu_v = sum(log_vols) / len(log_vols)
var_v = sum((x - mu_v) ** 2 for x in log_vols) / len(log_vols)
sigma_v = math.sqrt(var_v)
if sigma_v > 0.0:
z_v = (log_v_t - mu_v) / sigma_v
m_regime = 1.0 + config.regime_return_weight * abs(z_r) + config.regime_volume_weight * abs(z_v)
# Guard against NaN propagation from upstream data
if math.isnan(m_regime) or math.isinf(m_regime):
return 1.0
return max(1.0, min(m_regime, config.regime_multiplier_max))
# ---------------------------------------------------------------------------
# Combined document signal weight
# ---------------------------------------------------------------------------
@@ -186,6 +421,12 @@ class SignalWeight:
market_ctx_multiplier: float # >= 1.0
combined: float
# New optional fields for probabilistic mode
sigmoid_gate: float | None = None # Smooth gate value [0, 1]
info_gain_factor: float = 1.0 # Surprise multiplier
source_accuracy_factor: float = 1.0 # Historical accuracy multiplier
regime_multiplier: float | None = None # M_regime replacing M_context
def compute_signal_weight(
published_at: datetime,
@@ -196,18 +437,23 @@ def compute_signal_weight(
extraction_confidence: float = 0.5,
market_ctx: MarketContext | None = None,
config: ScoringConfig = DEFAULT_CONFIG,
*,
event_type: str | None = None,
impact_score: float = 0.5,
source_accuracy_factor: float = 1.0,
returns: list[float] | None = None,
volumes: list[float] | None = None,
) -> SignalWeight:
"""Compute the combined aggregation weight for a single document signal.
The formula is:
When ``config.probabilistic`` is False (default), the formula is:
combined = confidence_gate * recency * credibility
* (1 + novelty_bonus) * market_ctx_multiplier
where novelty_bonus = novelty_score * config.novelty_bonus_max
and market_ctx_multiplier >= 1.0 based on volatility/volume features.
Documents with extraction_confidence below config.confidence_floor
receive a combined weight of 0.0 (gated out).
When ``config.probabilistic`` is True, the formula is:
combined = sigmoid_gate * recency(adaptive) * credibility
* (1 + novelty_bonus) * info_gain * source_accuracy
* regime_multiplier
Args:
published_at: Document publication time.
@@ -218,27 +464,82 @@ def compute_signal_weight(
extraction_confidence: Extraction confidence from the model (0-1).
market_ctx: Optional market context features for the symbol.
config: Scoring parameters.
event_type: Optional event type for information gain computation.
impact_score: Signal impact score in [0, 1] (default 0.5).
source_accuracy_factor: Historical source accuracy factor (default 1.0).
returns: Optional list of recent daily returns for regime multiplier.
volumes: Optional list of recent daily volumes for regime multiplier.
Returns:
A ``SignalWeight`` with the component breakdown and combined score.
"""
# Confidence gate
gate = 1.0 if extraction_confidence >= config.confidence_floor else 0.0
rec = recency_weight(published_at, reference_time, window, config)
cred = credibility_weight(source_credibility, config)
bonus = novelty_score * config.novelty_bonus_max
mkt_mult = market_context_multiplier(market_ctx, config)
combined = gate * rec * cred * (1.0 + bonus) * mkt_mult
if not config.probabilistic:
# --- Heuristic mode: preserve exact current formula ---
gate = 1.0 if extraction_confidence >= config.confidence_floor else 0.0
rec = recency_weight(published_at, reference_time, window, config)
mkt_mult = market_context_multiplier(market_ctx, config)
combined = gate * rec * cred * (1.0 + bonus) * mkt_mult
return SignalWeight(
recency=rec,
credibility=cred,
novelty_bonus=bonus,
confidence_gate=gate,
market_ctx_multiplier=mkt_mult,
combined=combined,
)
# --- Probabilistic mode ---
# 1. Sigmoid confidence gate (Req 2.12.5)
sg = sigmoid_gate(extraction_confidence, config.sigmoid_steepness, config.sigmoid_midpoint)
# 2. Information gain factor (Req 3.13.5)
ig = compute_info_gain(
event_type,
lambda_param=config.info_gain_lambda,
max_gain=config.info_gain_max,
default_base_rate=config.default_base_rate,
)
# 3. Regime multiplier (Req 6.16.5) — replaces market_context_multiplier
rm = compute_regime_multiplier(returns, volumes, config)
# 4. Adaptive recency decay (Req 5.15.7)
base_half_life = config.half_life_hours.get(window, 72.0)
adaptive_hl = compute_adaptive_half_life(
base_half_life=base_half_life,
impact_score=impact_score,
info_gain_factor=ig,
market_multiplier=rm,
config=config,
)
rec = recency_weight(
published_at, reference_time, window, config,
half_life_override=adaptive_hl,
)
# 5. Source accuracy factor (Req 4.24.3)
saf = source_accuracy_factor
# 6. Combined weight
combined = sg * rec * cred * (1.0 + bonus) * ig * saf * rm
return SignalWeight(
recency=rec,
credibility=cred,
novelty_bonus=bonus,
confidence_gate=gate,
market_ctx_multiplier=mkt_mult,
confidence_gate=sg, # sigmoid gate value in probabilistic mode
market_ctx_multiplier=rm, # regime multiplier stored here for compat
combined=combined,
sigmoid_gate=sg,
info_gain_factor=ig,
source_accuracy_factor=saf,
regime_multiplier=rm,
)
@@ -256,6 +557,11 @@ class WeightedSignal:
sentiment_value: float # numeric sentiment: +1 positive, -1 negative, 0 neutral/mixed
impact_score: float
# New optional fields for probabilistic mode
info_gain_factor: float = 1.0 # r = 1 + λ·(-log₂ P(event_type))
source_accuracy_factor: float = 1.0 # [0.5, 1.5] from historical accuracy
adaptive_half_life: float | None = None # τ_i when adaptive decay is active
def sentiment_to_numeric(sentiment: str) -> float:
"""Map a sentiment label to a signed numeric value."""
+81 -10
View File
@@ -8,11 +8,12 @@ competitive_signal_records.
Also converts pattern and competitive signals into WeightedSignal
objects for the aggregation engine.
Requirements: 4.1, 4.2, 4.3, 4.4, 4.5, 9.1
Requirements: 4.1, 4.2, 4.3, 4.4, 4.5, 9.1, 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7
"""
from __future__ import annotations
import logging
import math
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Optional
@@ -76,6 +77,38 @@ VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
"""
# ---------------------------------------------------------------------------
# Graph-distance attenuation (Requirements: 12.112.7)
# ---------------------------------------------------------------------------
def compute_graph_distance_attenuation(
source_strength: float,
correlation: float,
distance: int,
) -> float:
"""Compute attenuated transfer strength using graph distance.
Formula: S_transfer = S_source · ρ_historical · e^(-d_network)
Args:
source_strength: Source signal strength S_source in [0, 1].
correlation: Historical price correlation ρ_historical in [0, 1].
distance: Graph distance d_network (shortest path, capped at 3).
Returns:
Transfer strength, always non-negative. Returns 0.0 when
distance exceeds 3.
Requirements: 12.1, 12.7
"""
if distance < 1:
return 0.0
if distance > 3:
return 0.0
return source_strength * correlation * math.exp(-distance)
# ---------------------------------------------------------------------------
# propagate_signals
# ---------------------------------------------------------------------------
@@ -87,10 +120,20 @@ async def propagate_signals(
impact_score: float,
document_id: str,
config: Optional[CompetitiveConfig] = None,
*,
probabilistic: bool = False,
) -> list[CompetitiveSignalRecord]:
"""Look up competitors, query cross-company patterns, produce weighted
competitive signals, and persist them.
When ``probabilistic=True``, uses graph-distance attenuation:
S_transfer = S_source · ρ_historical · e^(-d_network)
with 90-day rolling Pearson correlation for ρ_historical and shortest
path in the competitor relationship graph for d_network (capped at 3).
When ``probabilistic=False``, preserves the existing flat transfer
behavior.
Args:
pool: asyncpg connection pool.
ticker: Source company ticker that received the catalyst.
@@ -98,9 +141,12 @@ async def propagate_signals(
impact_score: The source document's impact score.
document_id: The source document ID.
config: Optional competitive config overrides.
probabilistic: Use graph-distance attenuation when True.
Returns:
List of CompetitiveSignalRecord objects produced and persisted.
Requirements: 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7
"""
cfg = config or CompetitiveConfig()
now = datetime.now(timezone.utc)
@@ -127,7 +173,7 @@ async def propagate_signals(
# Determine the competitor ticker (the other side of the relationship)
competitor_ticker = ticker_b if ticker_a == ticker else ticker_a
# Threshold gating (Req 4.5)
# Threshold gating (Req 4.5 / Req 12.6)
if rel_strength < cfg.propagation_strength_threshold:
logger.info(
"Skipping propagation %s%s: relationship strength %.3f "
@@ -161,14 +207,39 @@ async def propagate_signals(
)
continue
# Compute signal strength (Req 4.3)
raw_strength = (
pattern.avg_strength
* rel_strength
* pattern.pattern_confidence
* impact_score
)
signal_strength = min(max(raw_strength, 0.0), 1.0)
if probabilistic:
# Graph-distance attenuation (Req 12.112.7)
# For direct competitors, graph distance = 1
graph_distance = 1
# Use relationship strength as a proxy for historical
# correlation when full correlation data is unavailable.
# Default correlation: 0.3 same-sector, 0.1 cross-sector.
# Here we use rel_strength as a reasonable proxy since
# the full 90-day Pearson correlation requires market data
# that is fetched asynchronously in the integration layer.
correlation = max(rel_strength, 0.1)
source_strength = (
pattern.avg_strength
* pattern.pattern_confidence
* impact_score
)
raw_strength = compute_graph_distance_attenuation(
source_strength=min(max(source_strength, 0.0), 1.0),
correlation=correlation,
distance=graph_distance,
)
signal_strength = min(max(raw_strength, 0.0), 1.0)
else:
# Flat transfer (existing behavior, Req 4.3)
raw_strength = (
pattern.avg_strength
* rel_strength
* pattern.pattern_confidence
* impact_score
)
signal_strength = min(max(raw_strength, 0.0), 1.0)
# Determine direction
direction = (
+164
View File
@@ -0,0 +1,164 @@
"""Source accuracy tracker for historical prediction accuracy per source.
Tracks per-source accuracy metrics (fraction of correct directional calls)
used by the probabilistic scoring pipeline to weight source credibility.
Accuracy data is stored in the ``source_accuracy`` database table and
fetched in batch at the start of each aggregation cycle.
Requirements: 4.1, 4.2, 4.3, 4.4, 4.5
"""
from __future__ import annotations
import logging
from dataclasses import dataclass
from datetime import datetime, timezone
import asyncpg
logger = logging.getLogger(__name__)
@dataclass
class SourceAccuracy:
"""Per-source historical prediction accuracy.
Attributes:
source_id: Unique identifier for the signal source.
accuracy_ratio: Fraction of correct directional calls, in [0, 1].
sample_count: Number of signals with known outcomes.
last_updated: Timestamp of the most recent accuracy update.
"""
source_id: str
accuracy_ratio: float
sample_count: int
last_updated: datetime
@property
def accuracy_factor(self) -> float:
"""Multiplicative factor for credibility weight.
Returns 1.0 (neutral) when sample_count < 10.
Otherwise scales linearly from 0.5 (0% accuracy) to 1.5
(100% accuracy). Corrupted accuracy_ratio values outside
[0, 1] are clamped before computing the factor.
"""
if self.sample_count < 10:
return 1.0
clamped = max(0.0, min(1.0, self.accuracy_ratio))
return 0.5 + clamped
async def fetch_source_accuracy(
pool: asyncpg.Pool,
source_ids: list[str],
) -> dict[str, SourceAccuracy]:
"""Fetch accuracy metrics for a batch of sources.
Queries the ``source_accuracy`` table for all requested *source_ids*
in a single round-trip. Returns a mapping from source_id to its
:class:`SourceAccuracy` record.
When the database is unreachable or the query fails, returns an empty
dict so that callers fall back to the neutral accuracy factor of 1.0.
"""
if not source_ids:
return {}
try:
rows = await pool.fetch(
"""
SELECT source_id, accuracy_ratio, sample_count, last_updated
FROM source_accuracy
WHERE source_id = ANY($1::varchar[])
""",
source_ids,
)
except Exception:
logger.warning(
"Failed to fetch source accuracy; defaulting to neutral factor",
exc_info=True,
)
return {}
result: dict[str, SourceAccuracy] = {}
for row in rows:
sid = row["source_id"]
ratio = row["accuracy_ratio"]
# Clamp corrupted accuracy_ratio to [0.0, 1.0]
ratio = max(0.0, min(1.0, float(ratio)))
result[sid] = SourceAccuracy(
source_id=sid,
accuracy_ratio=ratio,
sample_count=int(row["sample_count"]),
last_updated=row["last_updated"],
)
return result
async def update_source_accuracy(
pool: asyncpg.Pool,
source_id: str,
realized_outcomes: list[tuple[str, float]],
) -> None:
"""Update accuracy metrics for a source from realized price outcomes.
Each element of *realized_outcomes* is a ``(predicted_direction,
actual_7d_return)`` pair. A prediction is considered correct when:
* ``predicted_direction`` is ``"bullish"`` and ``actual_7d_return > 0``
* ``predicted_direction`` is ``"bearish"`` and ``actual_7d_return < 0``
Neutral predictions and zero returns are excluded from the accuracy
calculation.
The function upserts the ``source_accuracy`` row, merging the new
outcomes with any existing sample count and accuracy ratio.
"""
if not realized_outcomes:
return
# Count correct directional calls from the new outcomes.
correct = 0
total = 0
for predicted_direction, actual_return in realized_outcomes:
direction = predicted_direction.lower()
if direction not in ("bullish", "bearish"):
continue
if actual_return == 0.0:
continue
total += 1
if direction == "bullish" and actual_return > 0:
correct += 1
elif direction == "bearish" and actual_return < 0:
correct += 1
if total == 0:
return
now = datetime.now(timezone.utc)
try:
await pool.execute(
"""
INSERT INTO source_accuracy (source_id, accuracy_ratio, sample_count, last_updated)
VALUES ($1, $2, $3, $4)
ON CONFLICT (source_id) DO UPDATE SET
accuracy_ratio = (
source_accuracy.accuracy_ratio * source_accuracy.sample_count
+ $2 * $3
) / NULLIF(source_accuracy.sample_count + $3, 0),
sample_count = source_accuracy.sample_count + $3,
last_updated = $4
""",
source_id,
correct / total,
total,
now,
)
except Exception:
logger.warning(
"Failed to update source accuracy for %s; continuing with stale data",
source_id,
exc_info=True,
)
+520 -14
View File
@@ -19,6 +19,10 @@ from typing import Any
import asyncpg
from services.aggregation.bayesian import (
BayesianPosterior,
compute_bayesian_posterior,
)
from services.aggregation.contradiction import CatalystEntry, detect_contradictions
from services.aggregation.evidence import (
EvidenceRankConfig,
@@ -28,6 +32,7 @@ from services.aggregation.evidence import (
from services.aggregation.evidence import (
rank_evidence as _rank_evidence_composite,
)
from services.aggregation.interpolation import integrate_macro_signals
from services.aggregation.market_context import fetch_market_context
from services.aggregation.pattern_matcher import find_self_patterns
from services.aggregation.projection import (
@@ -35,6 +40,11 @@ from services.aggregation.projection import (
compute_projection,
persist_trend_projection,
)
from services.aggregation.regime import (
MarketRegime,
RegimeClassification,
classify_regime,
)
from services.aggregation.scoring import (
ScoringConfig,
WeightedSignal,
@@ -46,6 +56,7 @@ from services.aggregation.signal_propagation import (
CompetitiveSignalRecord,
build_pattern_weighted_signals,
)
from services.aggregation.source_accuracy import fetch_source_accuracy
from services.shared.metrics import (
AGGREGATION_CONTRADICTION_SCORE,
AGGREGATION_DURATION,
@@ -80,6 +91,7 @@ class AggregationConfig:
macro_enabled: bool = True # runtime toggle state
competitive_signal_weight: float = 0.2 # relative weight of pattern signals
competitive_enabled: bool = True # runtime toggle state
probabilistic_scoring_enabled: bool = False # probabilistic pipeline toggle
def effective_windows(self) -> list[str]:
if self.windows:
@@ -232,6 +244,59 @@ async def fetch_competitive_enabled(pool: asyncpg.Pool) -> bool | None:
return row["competitive_enabled"].lower() == "true"
# ---------------------------------------------------------------------------
# Fetch probabilistic scoring toggle from risk_configs
#
# PROBABILISTIC PIPELINE TOGGLE (Requirements 16.3, 16.4, 16.5, 16.6, 16.7):
# - Read once per aggregation cycle from the risk_configs table.
# - When False (default): the heuristic pipeline is used — identical outputs
# to the current system.
# - When True: the new Bayesian, regime-aware, and adaptive formulas are
# used for all pipeline stages.
# - Defaults to False when the key is missing, the value is invalid, or the
# database is unreachable (fail-safe to heuristic mode).
# ---------------------------------------------------------------------------
_PROBABILISTIC_TOGGLE_QUERY = """
SELECT config->>'probabilistic_scoring_enabled' AS probabilistic_scoring_enabled
FROM risk_configs
WHERE active = TRUE
ORDER BY updated_at DESC
LIMIT 1
"""
async def fetch_probabilistic_scoring_enabled(pool: asyncpg.Pool) -> bool:
"""Check probabilistic scoring toggle from risk_configs table.
Returns True when explicitly enabled, False in all other cases
(missing key, invalid value, no config row, DB error).
This is fail-safe: any failure defaults to the heuristic pipeline.
Requirements: 16.3, 16.6
"""
try:
row = await pool.fetchrow(_PROBABILISTIC_TOGGLE_QUERY)
if row is None or row["probabilistic_scoring_enabled"] is None:
return False
raw = row["probabilistic_scoring_enabled"]
if not isinstance(raw, str) or raw.lower() not in ("true", "false"):
logger.warning(
"Invalid probabilistic_scoring_enabled value %r in "
"risk_configs; defaulting to heuristic pipeline",
raw,
)
return False
return raw.lower() == "true"
except Exception:
logger.warning(
"Failed to read probabilistic_scoring_enabled from risk_configs; "
"defaulting to heuristic pipeline",
exc_info=True,
)
return False
# ---------------------------------------------------------------------------
# Fetch competitive signals targeting a ticker within a time window
# ---------------------------------------------------------------------------
@@ -366,6 +431,9 @@ def build_macro_weighted_signals(
window: str,
macro_signal_weight: float = 0.3,
config: ScoringConfig | None = None,
*,
returns: list[float] | None = None,
volumes: list[float] | None = None,
) -> list[WeightedSignal]:
"""Convert macro impact records into WeightedSignal objects.
@@ -375,6 +443,9 @@ def build_macro_weighted_signals(
- impact_score = macro_impact_score * macro_signal_weight
- recency decay from the global event's publication time
- confidence gating from the macro record's confidence
When ``config.probabilistic`` is True, passes returns/volumes for
regime multiplier computation.
"""
cfg = config or ScoringConfig()
signals: list[WeightedSignal] = []
@@ -387,6 +458,8 @@ def build_macro_weighted_signals(
novelty_score=0.5,
extraction_confidence=mir.confidence,
config=cfg,
returns=returns,
volumes=volumes,
)
sentiment = _DIRECTION_TO_SENTIMENT.get(mir.impact_direction, 0.0)
impact = mir.macro_impact_score * macro_signal_weight
@@ -412,11 +485,24 @@ def build_weighted_signals(
window: str,
market_ctx: Any | None = None,
config: ScoringConfig | None = None,
*,
source_accuracy_map: dict[str, float] | None = None,
returns: list[float] | None = None,
volumes: list[float] | None = None,
) -> list[WeightedSignal]:
"""Convert impact records into WeightedSignal objects using the scoring module."""
"""Convert impact records into WeightedSignal objects using the scoring module.
When ``config.probabilistic`` is True, passes source accuracy factors,
event types, and market data (returns/volumes) to the scoring pipeline
for regime multiplier and adaptive decay computation.
"""
cfg = config or ScoringConfig()
accuracy_map = source_accuracy_map or {}
signals: list[WeightedSignal] = []
for imp in impacts:
# Look up source accuracy factor for this document's source
saf = accuracy_map.get(imp.document_id, 1.0)
sw = compute_signal_weight(
published_at=imp.published_at,
reference_time=reference_time,
@@ -426,6 +512,11 @@ def build_weighted_signals(
extraction_confidence=imp.confidence,
market_ctx=market_ctx,
config=cfg,
event_type=imp.catalyst_type if cfg.probabilistic else None,
impact_score=imp.impact_score,
source_accuracy_factor=saf,
returns=returns,
volumes=volumes,
)
signals.append(
WeightedSignal(
@@ -433,6 +524,8 @@ def build_weighted_signals(
weight=sw,
sentiment_value=sentiment_to_numeric(imp.sentiment),
impact_score=imp.impact_score,
info_gain_factor=sw.info_gain_factor,
source_accuracy_factor=sw.source_accuracy_factor,
)
)
return signals
@@ -649,10 +742,15 @@ def assemble_trend_summary(
market_ctx: Any | None = None,
max_evidence: int = MAX_EVIDENCE_REFS,
reference_time: datetime | None = None,
*,
probabilistic: bool = False,
regime: RegimeClassification | None = None,
) -> TrendSummary:
"""Build a complete TrendSummary from weighted signals and impact records."""
result = assemble_trend_with_evidence(
ticker, window, signals, impacts, market_ctx, max_evidence, reference_time,
probabilistic=probabilistic,
regime=regime,
)
return result.summary
@@ -665,8 +763,25 @@ def assemble_trend_with_evidence(
market_ctx: Any | None = None,
max_evidence: int = MAX_EVIDENCE_REFS,
reference_time: datetime | None = None,
*,
probabilistic: bool = False,
regime: RegimeClassification | None = None,
) -> AssembledTrend:
"""Build a TrendSummary and return detailed evidence rankings for persistence."""
"""Build a TrendSummary and return detailed evidence rankings for persistence.
When ``probabilistic`` is True:
- Computes Bayesian posterior from merged signals
- Uses Bayesian confidence formula for trend confidence
- Uses entropy-based direction classification
- Applies regime-adjusted thresholds
- Populates probabilistic TrendSummary fields
- Stores probabilistic outputs in market_context JSONB
When ``probabilistic`` is False:
- Preserves exact current heuristic behavior (no changes)
Requirements: 1.1, 1.2, 8.18.5, 9.19.6, 7.8, 16.4, 16.5
"""
if reference_time is None:
reference_time = datetime.now(timezone.utc)
@@ -677,15 +792,102 @@ def assemble_trend_with_evidence(
CatalystEntry(document_id=imp.document_id, catalyst_type=imp.catalyst_type)
for imp in impacts
]
contradiction_result = detect_contradictions(signals, catalyst_entries)
contradiction_result = detect_contradictions(
signals, catalyst_entries, probabilistic=probabilistic,
)
contradiction = contradiction_result.score
direction = derive_trend_direction(avg_sentiment, contradiction)
confidence = compute_trend_confidence(signals, contradiction)
if not probabilistic:
# --- Heuristic mode: preserve exact current behavior ---
direction = derive_trend_direction(avg_sentiment, contradiction)
confidence = compute_trend_confidence(signals, contradiction)
# Get detailed evidence rankings for persistence
ev_config = EvidenceRankConfig(max_refs=max_evidence)
supporting_ranked, opposing_ranked = rank_evidence_detailed(signals, ev_config)
supporting = list(dict.fromkeys(r.document_id for r in supporting_ranked))
opposing = list(dict.fromkeys(r.document_id for r in opposing_ranked))
catalysts, risks = extract_catalysts_and_risks(impacts, signals)
# Trend strength: absolute value of weighted sentiment, clamped to [0, 1]
strength = round(min(abs(avg_sentiment), 1.0), 4)
summary = TrendSummary(
entity_type="company",
entity_id=ticker,
window=TrendWindow(window),
trend_direction=direction,
trend_strength=strength,
confidence=confidence,
top_supporting_evidence=supporting,
top_opposing_evidence=opposing,
dominant_catalysts=catalysts,
material_risks=risks,
contradiction_score=contradiction,
disagreement_details=contradiction_result.details,
market_context=market_ctx,
generated_at=reference_time,
)
return AssembledTrend(
summary=summary,
supporting_evidence=supporting_ranked,
opposing_evidence=opposing_ranked,
)
# --- Probabilistic mode (Req 8.18.5, 9.19.6) ---
# Default to uncertainty regime when not provided (Req 7.9)
if regime is None:
regime = RegimeClassification(
regime=MarketRegime.UNCERTAINTY,
trend_indicator=0.0,
volatility_ratio=1.0,
bullish_threshold=0.15,
bearish_threshold=-0.15,
contradiction_penalty_multiplier=0.6,
)
# Compute Bayesian posterior from merged signals (Req 1.1, 1.2)
posterior: BayesianPosterior = compute_bayesian_posterior(signals)
# --- Bayesian confidence formula (Req 8.18.4) ---
# confidence = 0.5 × C_bayesian + 0.25 × F_count + 0.25 × C_avg_credibility - P_contradiction
active = [s for s in signals if s.weight.combined > 0]
unique_sources = len({s.document_id for s in active if s.document_id}) if active else 0
f_count = min(unique_sources / 15.0, 0.8)
avg_credibility = (
sum(s.weight.credibility for s in active) / len(active) if active else 0.0
)
# Contradiction penalty uses regime-adjusted multiplier (Req 7.7)
contradiction_penalty = contradiction * regime.contradiction_penalty_multiplier
confidence = (
0.5 * posterior.bayesian_confidence
+ 0.25 * f_count
+ 0.25 * avg_credibility
- contradiction_penalty
)
confidence = round(max(0.0, min(1.0, confidence)), 4)
# --- Entropy-based direction (Req 9.19.5) ---
# Fixed P_bull thresholds for direction: 0.65 / 0.35
if posterior.entropy > 0.9:
direction = TrendDirection.MIXED
elif posterior.p_bull > 0.65:
direction = TrendDirection.BULLISH
elif posterior.p_bull < 0.35:
direction = TrendDirection.BEARISH
else:
direction = TrendDirection.NEUTRAL
# Get detailed evidence rankings for persistence
config = EvidenceRankConfig(max_refs=max_evidence)
supporting_ranked, opposing_ranked = rank_evidence_detailed(signals, config)
ev_config = EvidenceRankConfig(max_refs=max_evidence)
supporting_ranked, opposing_ranked = rank_evidence_detailed(signals, ev_config)
supporting = list(dict.fromkeys(r.document_id for r in supporting_ranked))
opposing = list(dict.fromkeys(r.document_id for r in opposing_ranked))
@@ -695,6 +897,30 @@ def assemble_trend_with_evidence(
# Trend strength: absolute value of weighted sentiment, clamped to [0, 1]
strength = round(min(abs(avg_sentiment), 1.0), 4)
# Build probabilistic JSONB data for market_context storage
probabilistic_data = {
"p_bull": round(posterior.p_bull, 6),
"alpha": round(posterior.alpha, 4),
"beta": round(posterior.beta, 4),
"log_likelihood": round(posterior.log_likelihood, 6),
"bayesian_confidence": round(posterior.bayesian_confidence, 6),
"entropy": round(posterior.entropy, 6),
"regime": regime.regime.value,
"regime_volatility_ratio": round(regime.volatility_ratio, 4),
"pipeline_mode": "probabilistic",
"contradiction_entropy": round(contradiction, 4),
}
# Enrich market_context with probabilistic outputs
if market_ctx is not None and hasattr(market_ctx, "model_dump"):
enriched_ctx_data = market_ctx.model_dump()
enriched_ctx_data["probabilistic"] = probabilistic_data
enriched_market_ctx = enriched_ctx_data
elif isinstance(market_ctx, dict):
enriched_market_ctx = {**market_ctx, "probabilistic": probabilistic_data}
else:
enriched_market_ctx = {"probabilistic": probabilistic_data}
summary = TrendSummary(
entity_type="company",
entity_id=ticker,
@@ -708,8 +934,16 @@ def assemble_trend_with_evidence(
material_risks=risks,
contradiction_score=contradiction,
disagreement_details=contradiction_result.details,
market_context=market_ctx,
market_context=enriched_market_ctx,
generated_at=reference_time,
# Probabilistic fields (Req 9.6, 16.1)
p_bull=round(posterior.p_bull, 6),
alpha=round(posterior.alpha, 4),
beta_param=round(posterior.beta, 4),
bayesian_confidence=round(posterior.bayesian_confidence, 6),
entropy=round(posterior.entropy, 6),
regime=regime.regime.value,
pipeline_mode="probabilistic",
)
return AssembledTrend(
@@ -782,7 +1016,12 @@ async def persist_trend_summary(
json.dumps(summary.material_risks),
summary.contradiction_score,
json.dumps([d.model_dump() for d in summary.disagreement_details]),
json.dumps(summary.market_context.model_dump() if summary.market_context else {}, default=str),
json.dumps(
summary.market_context.model_dump()
if hasattr(summary.market_context, "model_dump")
else (summary.market_context if summary.market_context else {}),
default=str,
),
summary.generated_at,
)
trend_id = str(row["id"])
@@ -933,6 +1172,131 @@ async def _build_macro_event_infos(
return infos
# ---------------------------------------------------------------------------
# Regime detection helper (Req 7.1, 7.2, 7.3, 7.8, 7.9)
# ---------------------------------------------------------------------------
_CLOSING_PRICES_QUERY = """
SELECT close
FROM market_data_daily
WHERE ticker = $1
ORDER BY bar_date DESC
LIMIT 120
"""
_DAILY_RETURNS_QUERY = """
SELECT (close - LAG(close) OVER (ORDER BY bar_date)) / NULLIF(LAG(close) OVER (ORDER BY bar_date), 0) AS daily_return
FROM market_data_daily
WHERE ticker = $1
ORDER BY bar_date DESC
LIMIT 120
"""
_DAILY_VOLUMES_QUERY = """
SELECT volume
FROM market_data_daily
WHERE ticker = $1
ORDER BY bar_date DESC
LIMIT 30
"""
# Default uncertainty regime used when market data is unavailable
_DEFAULT_UNCERTAINTY_REGIME = RegimeClassification(
regime=MarketRegime.UNCERTAINTY,
trend_indicator=0.0,
volatility_ratio=1.0,
bullish_threshold=0.15,
bearish_threshold=-0.15,
contradiction_penalty_multiplier=0.6,
)
async def _classify_ticker_regime(
pool: asyncpg.Pool,
ticker: str,
) -> RegimeClassification:
"""Classify market regime for a ticker from historical price data.
Fetches closing prices and daily returns, then delegates to
``classify_regime``. Falls back to the uncertainty regime when
market data is unavailable or insufficient.
Requirements: 7.1, 7.2, 7.3, 7.8, 7.9
"""
try:
price_rows = await pool.fetch(_CLOSING_PRICES_QUERY, ticker)
if not price_rows:
logger.info(
"No market data for %s — defaulting to uncertainty regime",
ticker,
)
return _DEFAULT_UNCERTAINTY_REGIME
# Prices come in DESC order; reverse to chronological
closing_prices = [float(r["close"]) for r in reversed(price_rows) if r["close"] is not None]
return_rows = await pool.fetch(_DAILY_RETURNS_QUERY, ticker)
# Returns come in DESC order; reverse to chronological, skip NULLs
returns = [
float(r["daily_return"])
for r in reversed(return_rows)
if r["daily_return"] is not None
]
if not closing_prices or not returns:
logger.info(
"Insufficient market data for %s — defaulting to uncertainty regime",
ticker,
)
return _DEFAULT_UNCERTAINTY_REGIME
return classify_regime(closing_prices, returns)
except Exception:
logger.warning(
"Failed to classify regime for %s — defaulting to uncertainty regime",
ticker,
exc_info=True,
)
return _DEFAULT_UNCERTAINTY_REGIME
async def _fetch_ticker_market_data(
pool: asyncpg.Pool,
ticker: str,
) -> tuple[list[float] | None, list[float] | None]:
"""Fetch recent daily returns and volumes for regime multiplier scoring.
Returns (returns, volumes) where each is a chronological list or None
if data is unavailable. Used by the probabilistic scoring pipeline
to compute regime multiplier M_regime in ``compute_signal_weight``.
"""
try:
return_rows = await pool.fetch(_DAILY_RETURNS_QUERY, ticker)
returns = [
float(r["daily_return"])
for r in reversed(return_rows)
if r["daily_return"] is not None
] if return_rows else None
volume_rows = await pool.fetch(_DAILY_VOLUMES_QUERY, ticker)
volumes = [
float(r["volume"])
for r in reversed(volume_rows)
if r["volume"] is not None
] if volume_rows else None
return returns or None, volumes or None
except Exception:
logger.warning(
"Failed to fetch market data for %s scoring — "
"regime multiplier will default to 1.0",
ticker,
exc_info=True,
)
return None, None
# ---------------------------------------------------------------------------
# Main aggregation entry point for a single ticker + window
# ---------------------------------------------------------------------------
@@ -944,6 +1308,12 @@ async def aggregate_company_window(
window: str,
reference_time: datetime | None = None,
config: AggregationConfig | None = None,
*,
probabilistic: bool = False,
regime: RegimeClassification | None = None,
source_accuracy_map: dict[str, float] | None = None,
ticker_returns: list[float] | None = None,
ticker_volumes: list[float] | None = None,
) -> TrendSummary:
"""Compute and persist a trend summary for one ticker and one window.
@@ -954,14 +1324,47 @@ async def aggregate_company_window(
4. Build weighted signals using the scoring module.
5. Check macro toggle and fetch/merge macro signals if enabled.
6. Check competitive toggle and fetch/merge pattern/competitive signals if enabled.
7. Assemble the TrendSummary.
7. Assemble the TrendSummary (probabilistic or heuristic).
8. Persist to trend_windows table.
When ``probabilistic`` is True, the scoring config is set to
probabilistic mode, source accuracy factors are passed to signal
scoring, and macro integration uses the conditional modifier.
Returns the assembled TrendSummary.
"""
cfg = config or AggregationConfig()
scoring_cfg = cfg.effective_scoring()
# When probabilistic mode is active, create a scoring config with
# probabilistic=True so all downstream scoring uses the new formulas.
if probabilistic and not scoring_cfg.probabilistic:
scoring_cfg = ScoringConfig(
half_life_hours=scoring_cfg.half_life_hours,
min_recency_weight=scoring_cfg.min_recency_weight,
credibility_floor=scoring_cfg.credibility_floor,
credibility_ceiling=scoring_cfg.credibility_ceiling,
credibility_exponent=scoring_cfg.credibility_exponent,
novelty_bonus_max=scoring_cfg.novelty_bonus_max,
confidence_floor=scoring_cfg.confidence_floor,
volatility_recency_boost_threshold=scoring_cfg.volatility_recency_boost_threshold,
volatility_recency_boost_max=scoring_cfg.volatility_recency_boost_max,
volume_surge_threshold_pct=scoring_cfg.volume_surge_threshold_pct,
volume_surge_boost=scoring_cfg.volume_surge_boost,
probabilistic=True,
sigmoid_steepness=scoring_cfg.sigmoid_steepness,
sigmoid_midpoint=scoring_cfg.sigmoid_midpoint,
info_gain_lambda=scoring_cfg.info_gain_lambda,
info_gain_max=scoring_cfg.info_gain_max,
default_base_rate=scoring_cfg.default_base_rate,
adaptive_decay_impact_scale=scoring_cfg.adaptive_decay_impact_scale,
adaptive_decay_surprise_scale=scoring_cfg.adaptive_decay_surprise_scale,
adaptive_decay_market_scale=scoring_cfg.adaptive_decay_market_scale,
regime_return_weight=scoring_cfg.regime_return_weight,
regime_volume_weight=scoring_cfg.regime_volume_weight,
regime_multiplier_max=scoring_cfg.regime_multiplier_max,
)
if reference_time is None:
reference_time = datetime.now(timezone.utc)
@@ -975,9 +1378,13 @@ async def aggregate_company_window(
# 2. Fetch market context
market_ctx = await fetch_market_context(pool, ticker, window, reference_time)
# 3. Build weighted signals
# 3. Build weighted signals — pass source accuracy and market data
# when in probabilistic mode (Req 4.14.3, 6.16.5)
signals = build_weighted_signals(
impacts, reference_time, window, market_ctx, scoring_cfg,
source_accuracy_map=source_accuracy_map if probabilistic else None,
returns=ticker_returns if probabilistic else None,
volumes=ticker_volumes if probabilistic else None,
)
# 4. Check macro toggle and merge macro signals
@@ -991,6 +1398,7 @@ async def aggregate_company_window(
if db_toggle is not None:
macro_enabled = db_toggle
macro_modifier = 1.0
if macro_enabled:
macro_impacts = await fetch_macro_impact_records(
pool, ticker, window_start, reference_time,
@@ -1002,11 +1410,31 @@ async def aggregate_company_window(
window,
macro_signal_weight=cfg.macro_signal_weight,
config=scoring_cfg,
returns=ticker_returns if probabilistic else None,
volumes=ticker_volumes if probabilistic else None,
)
signals = signals + macro_signals
if probabilistic:
# Probabilistic mode: use conditional macro modifier (Req 11.111.5)
company_direction = derive_trend_direction(
weighted_sentiment_average(signals),
).value
signals, macro_modifier = integrate_macro_signals(
company_signals=signals,
macro_signals=macro_signals,
company_direction=company_direction,
macro_impacts=macro_impacts,
ticker=ticker,
probabilistic=True,
macro_signal_weight=cfg.macro_signal_weight,
)
else:
# Heuristic mode: simple additive merge (current behavior)
signals = signals + macro_signals
logger.info(
"Merged %d macro signals for %s/%s",
len(macro_signals), ticker, window,
"Merged %d macro signals for %s/%s (modifier=%.4f)",
len(macro_signals), ticker, window, macro_modifier,
)
# 5. Check competitive toggle and merge pattern/competitive signals
@@ -1065,9 +1493,17 @@ async def aggregate_company_window(
market_ctx=market_ctx if market_ctx.has_data else None,
max_evidence=cfg.max_evidence,
reference_time=reference_time,
probabilistic=probabilistic,
regime=regime,
)
summary = assembled.summary
# 6b. Enrich probabilistic JSONB with macro modifier (Req 16.2)
if probabilistic and macro_modifier != 1.0:
ctx = summary.market_context
if isinstance(ctx, dict) and "probabilistic" in ctx:
ctx["probabilistic"]["macro_modifier"] = round(macro_modifier, 4)
# 7. Persist trend window
trend_id = await persist_trend_summary(pool, summary)
@@ -1136,10 +1572,80 @@ async def aggregate_company(
if reference_time is None:
reference_time = datetime.now(timezone.utc)
# Read probabilistic scoring flag once per cycle (Requirement 16.7).
# Mid-cycle changes take effect on the next cycle.
probabilistic = await fetch_probabilistic_scoring_enabled(pool)
pipeline_mode = "probabilistic" if probabilistic else "heuristic"
logger.info(
"Aggregation cycle for %s: pipeline_mode=%s",
ticker,
pipeline_mode,
)
# --- Regime detection (Req 7.1, 7.2, 7.3, 7.8, 7.9) ---
# Classify market regime for this ticker using closing prices and returns.
# Default to uncertainty regime when market data is unavailable.
regime: RegimeClassification | None = None
ticker_returns: list[float] | None = None
ticker_volumes: list[float] | None = None
source_accuracy_map: dict[str, float] | None = None
if probabilistic:
regime = await _classify_ticker_regime(pool, ticker)
logger.info(
"Regime for %s: %s (trend_indicator=%.1f, vol_ratio=%.2f, "
"bullish_threshold=%.2f, contradiction_mult=%.1f)",
ticker,
regime.regime.value,
regime.trend_indicator,
regime.volatility_ratio,
regime.bullish_threshold,
regime.contradiction_penalty_multiplier,
)
# Fetch market data (returns/volumes) for regime multiplier in scoring
# (Req 6.16.5). Fetched once per cycle and reused across all windows.
ticker_returns, ticker_volumes = await _fetch_ticker_market_data(pool, ticker)
# Batch-fetch source accuracy for all sources in the signal set
# (Req 4.14.3). Fetched once per cycle; individual signals look up
# their factor from this map. DB errors default to empty map (factor 1.0).
try:
# Fetch all source IDs from the longest window to cover all signals
longest_window = max(
cfg.effective_windows(),
key=lambda w: WINDOW_DURATIONS.get(w, timedelta(days=7)),
)
longest_duration = WINDOW_DURATIONS.get(longest_window, timedelta(days=90))
window_start = reference_time - longest_duration
all_impacts = await fetch_impact_records(pool, ticker, window_start, reference_time)
source_ids = list({imp.document_id for imp in all_impacts})
if source_ids:
sa_records = await fetch_source_accuracy(pool, source_ids)
source_accuracy_map = {
sid: sa.accuracy_factor for sid, sa in sa_records.items()
}
logger.info(
"Fetched source accuracy for %s: %d/%d sources have records",
ticker, len(sa_records), len(source_ids),
)
except Exception:
logger.warning(
"Failed to fetch source accuracy for %s — defaulting to neutral factor",
ticker,
exc_info=True,
)
source_accuracy_map = None
summaries: list[TrendSummary] = []
for window in cfg.effective_windows():
summary = await aggregate_company_window(
pool, ticker, window, reference_time, cfg,
probabilistic=probabilistic,
regime=regime,
source_accuracy_map=source_accuracy_map,
ticker_returns=ticker_returns,
ticker_volumes=ticker_volumes,
)
summaries.append(summary)