feat: signal math upgrade — probabilistic, regime-aware scoring pipeline
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/push/build-1 Pipeline was successful
ci/woodpecker/push/build-2 Pipeline was successful
ci/woodpecker/push/build-3 Pipeline was successful
ci/woodpecker/push/finalize Pipeline was successful
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.adapters.broker_adapter name:broker-adapter]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.aggregation.worker name:aggregation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.extractor.worker name:extractor]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.ingestion.worker name:ingestion]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.lake_publisher.worker name:lake-publisher]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.parser.worker name:parser]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.recommendation.worker name:recommendation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.scheduler.app name:scheduler]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.api.app:app --host 0.0.0.0 --port 8000 name:query-api]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.risk.app:app --host 0.0.0.0 --port 8000 name:risk]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000 name:symbol-registry]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.trading.app:app --host 0.0.0.0 --port 8000 name:trading-engine]) (push) Has been cancelled
Build and Push / build-dashboard (push) Has been cancelled
Build and Push / build-superset (push) Has been cancelled
Build and Push / integration-test (push) Has been cancelled
Build and Push / beta-gate (push) Has been cancelled
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/push/build-1 Pipeline was successful
ci/woodpecker/push/build-2 Pipeline was successful
ci/woodpecker/push/build-3 Pipeline was successful
ci/woodpecker/push/finalize Pipeline was successful
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.adapters.broker_adapter name:broker-adapter]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.aggregation.worker name:aggregation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.extractor.worker name:extractor]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.ingestion.worker name:ingestion]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.lake_publisher.worker name:lake-publisher]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.parser.worker name:parser]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.recommendation.worker name:recommendation]) (push) Has been cancelled
Build and Push / build-services (map[cmd:python -m services.scheduler.app name:scheduler]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.api.app:app --host 0.0.0.0 --port 8000 name:query-api]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.risk.app:app --host 0.0.0.0 --port 8000 name:risk]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.symbol_registry.app:app --host 0.0.0.0 --port 8000 name:symbol-registry]) (push) Has been cancelled
Build and Push / build-services (map[cmd:uvicorn services.trading.app:app --host 0.0.0.0 --port 8000 name:trading-engine]) (push) Has been cancelled
Build and Push / build-dashboard (push) Has been cancelled
Build and Push / build-superset (push) Has been cancelled
Build and Push / integration-test (push) Has been cancelled
Build and Push / beta-gate (push) Has been cancelled
Implement full probabilistic signal processing pipeline gated behind probabilistic_scoring_enabled feature flag in risk_configs: - Bayesian log-likelihood accumulator with Beta posterior and entropy - Regime detector (trend-following, panic, mean-reversion, uncertainty) - Source accuracy tracker with per-source historical prediction accuracy - Sigmoid confidence gate replacing binary gate - Information gain surprise weighting for rare events - Adaptive recency decay with event-specific half-lives - Regime multiplier replacing market context multiplier - Weighted disagreement entropy for contradiction detection - Multiplicative macro exposure with conditional integration - Graph-distance attenuated competitive signal propagation - Exponentially weighted momentum with volatility scaling - Expected value recommendation gate All changes backward-compatible: flag=false preserves exact current behavior. New outputs stored in existing JSONB columns (no schema changes except source_accuracy table via migration 034). Tests: 26 property-based tests (14 correctness properties), 99 unit tests, 1789 total tests passing with zero regressions.
This commit is contained in:
@@ -0,0 +1,127 @@
|
||||
"""Bayesian accumulator for probabilistic sentiment aggregation.
|
||||
|
||||
Accumulates weighted signals into a Bayesian posterior using
|
||||
log-likelihood accumulation, Beta distribution parameters, and
|
||||
Shannon entropy for mixed-signal detection.
|
||||
|
||||
Requirements: 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 9.1, 9.7
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import math
|
||||
from dataclasses import dataclass
|
||||
|
||||
from services.aggregation.scoring import WeightedSignal
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class BayesianPosterior:
|
||||
"""Bayesian posterior state from signal accumulation."""
|
||||
|
||||
p_bull: float # σ(L_t), bullish probability [0, 1]
|
||||
alpha: float # Beta distribution α parameter (≥ 1.0)
|
||||
beta: float # Beta distribution β parameter (≥ 1.0)
|
||||
log_likelihood: float # Raw log-likelihood accumulation L_t
|
||||
bayesian_confidence: float # 1 - 4αβ/(α+β)², [0, 1]
|
||||
entropy: float # Shannon entropy H, [0, 1]
|
||||
signal_count: int # Number of signals processed
|
||||
|
||||
|
||||
# Uninformative prior (no evidence)
|
||||
PRIOR = BayesianPosterior(
|
||||
p_bull=0.5,
|
||||
alpha=1.0,
|
||||
beta=1.0,
|
||||
log_likelihood=0.0,
|
||||
bayesian_confidence=0.0,
|
||||
entropy=1.0,
|
||||
signal_count=0,
|
||||
)
|
||||
|
||||
|
||||
def compute_entropy(p_bull: float) -> float:
|
||||
"""Shannon entropy H = -p·log₂(p) - (1-p)·log₂(1-p).
|
||||
|
||||
Returns value in [0, 1]. Maximum at p=0.5, zero at p=0 or p=1.
|
||||
Handles edge cases p≤0 and p≥1 by returning 0.0.
|
||||
"""
|
||||
if p_bull <= 0.0 or p_bull >= 1.0:
|
||||
return 0.0
|
||||
q = 1.0 - p_bull
|
||||
return -(p_bull * math.log2(p_bull) + q * math.log2(q))
|
||||
|
||||
|
||||
def compute_bayesian_posterior(
|
||||
signals: list[WeightedSignal],
|
||||
) -> BayesianPosterior:
|
||||
"""Accumulate weighted signals into a Bayesian posterior.
|
||||
|
||||
Computes:
|
||||
- Log-likelihood: L_t = Σ(w_i · s_i)
|
||||
- Bullish probability: P_bull = σ(L_t)
|
||||
- Beta posterior: α = 1 + W_bull, β = 1 + W_bear
|
||||
- Bayesian confidence: C = 1 - 4αβ/(α+β)²
|
||||
- Shannon entropy: H = -p·log₂(p) - (1-p)·log₂(1-p)
|
||||
|
||||
Returns PRIOR for empty signal lists.
|
||||
Skips signals with NaN weight or sentiment.
|
||||
"""
|
||||
if not signals:
|
||||
return PRIOR
|
||||
|
||||
log_likelihood = 0.0
|
||||
w_bull = 0.0
|
||||
w_bear = 0.0
|
||||
count = 0
|
||||
|
||||
for sig in signals:
|
||||
combined = sig.weight.combined
|
||||
sentiment = sig.sentiment_value
|
||||
|
||||
# Skip signals with NaN weight or sentiment
|
||||
if math.isnan(combined) or math.isnan(sentiment):
|
||||
continue
|
||||
|
||||
log_likelihood += combined * sentiment
|
||||
|
||||
if sentiment > 0.0:
|
||||
w_bull += combined
|
||||
elif sentiment < 0.0:
|
||||
w_bear += combined
|
||||
|
||||
count += 1
|
||||
|
||||
if count == 0:
|
||||
return PRIOR
|
||||
|
||||
# P_bull via sigmoid: σ(L_t) = 1 / (1 + exp(-L_t))
|
||||
# Guard against overflow in exp for very large |L_t|
|
||||
if log_likelihood > 500.0:
|
||||
p_bull = 1.0
|
||||
elif log_likelihood < -500.0:
|
||||
p_bull = 0.0
|
||||
else:
|
||||
p_bull = 1.0 / (1.0 + math.exp(-log_likelihood))
|
||||
|
||||
# Beta posterior parameters
|
||||
alpha = 1.0 + w_bull
|
||||
beta_param = 1.0 + w_bear
|
||||
|
||||
# Bayesian confidence: C = 1 - 4αβ/(α+β)²
|
||||
ab_sum = alpha + beta_param
|
||||
bayesian_confidence = 1.0 - (4.0 * alpha * beta_param) / (ab_sum * ab_sum)
|
||||
# Clamp to [0, 1] to guard against floating-point rounding
|
||||
bayesian_confidence = max(0.0, min(1.0, bayesian_confidence))
|
||||
|
||||
# Shannon entropy
|
||||
entropy = compute_entropy(p_bull)
|
||||
|
||||
return BayesianPosterior(
|
||||
p_bull=p_bull,
|
||||
alpha=alpha,
|
||||
beta=beta_param,
|
||||
log_likelihood=log_likelihood,
|
||||
bayesian_confidence=bayesian_confidence,
|
||||
entropy=entropy,
|
||||
signal_count=count,
|
||||
)
|
||||
@@ -4,10 +4,11 @@ Analyses weighted signals to detect and represent disagreement explicitly,
|
||||
rather than collapsing contradictory evidence into a single unsupported
|
||||
conclusion.
|
||||
|
||||
Requirements: 6.4, 6.5
|
||||
Requirements: 6.4, 6.5, 15.1–15.7
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import math
|
||||
from dataclasses import dataclass
|
||||
|
||||
from services.aggregation.scoring import WeightedSignal
|
||||
@@ -35,6 +36,9 @@ class ContradictionResult:
|
||||
def detect_contradictions(
|
||||
signals: list[WeightedSignal],
|
||||
catalyst_entries: list[CatalystEntry] | None = None,
|
||||
*,
|
||||
probabilistic: bool = False,
|
||||
w_threshold: float = 5.0,
|
||||
) -> ContradictionResult:
|
||||
"""Run contradiction detection across multiple dimensions.
|
||||
|
||||
@@ -42,6 +46,16 @@ def detect_contradictions(
|
||||
1. Sentiment disagreement — the core positive-vs-negative split
|
||||
2. Catalyst disagreement — same catalyst type with opposing sentiment
|
||||
|
||||
When ``probabilistic`` is True, the overall score uses weighted
|
||||
disagreement entropy (Req 15.1–15.7) instead of the minority/majority
|
||||
ratio. When False, the existing ratio formula is preserved exactly.
|
||||
|
||||
Args:
|
||||
signals: Weighted signals to analyse.
|
||||
catalyst_entries: Optional catalyst metadata for per-catalyst analysis.
|
||||
probabilistic: Use entropy-based scoring when True.
|
||||
w_threshold: Evidence mass threshold for entropy weighting (default 5.0).
|
||||
|
||||
Returns a ContradictionResult with an overall score and per-dimension
|
||||
disagreement details.
|
||||
"""
|
||||
@@ -55,7 +69,10 @@ def detect_contradictions(
|
||||
catalyst_details = _detect_catalyst_disagreement(signals, catalyst_entries)
|
||||
details.extend(catalyst_details)
|
||||
|
||||
score = _compute_overall_score(signals)
|
||||
if probabilistic:
|
||||
score = _compute_entropy_score(signals, w_threshold)
|
||||
else:
|
||||
score = _compute_overall_score(signals)
|
||||
|
||||
return ContradictionResult(score=score, details=details)
|
||||
|
||||
@@ -82,6 +99,58 @@ def _compute_overall_score(signals: list[WeightedSignal]) -> float:
|
||||
return round(minority / total, 4)
|
||||
|
||||
|
||||
def _compute_entropy_score(
|
||||
signals: list[WeightedSignal],
|
||||
w_threshold: float = 5.0,
|
||||
) -> float:
|
||||
"""Weighted disagreement entropy — probabilistic contradiction score.
|
||||
|
||||
Computes Shannon entropy over the positive/negative weight distribution,
|
||||
weighted by evidence mass relative to a configurable threshold.
|
||||
|
||||
Formula:
|
||||
f_pos = W_pos / (W_pos + W_neg)
|
||||
f_neg = 1 - f_pos
|
||||
H = -f_pos·log₂(f_pos) - f_neg·log₂(f_neg) (in [0, 1])
|
||||
score = H · min(1.0, (W_pos + W_neg) / W_threshold)
|
||||
|
||||
Returns 0.0 when only one direction exists (no disagreement).
|
||||
|
||||
Requirements: 15.1–15.7
|
||||
"""
|
||||
if not signals:
|
||||
return 0.0
|
||||
|
||||
pos_weight = 0.0
|
||||
neg_weight = 0.0
|
||||
for sig in signals:
|
||||
w = sig.weight.combined * sig.impact_score
|
||||
if sig.sentiment_value > 0:
|
||||
pos_weight += w
|
||||
elif sig.sentiment_value < 0:
|
||||
neg_weight += w
|
||||
|
||||
# No disagreement when only one direction exists (Req 15.5)
|
||||
if pos_weight <= 0.0 or neg_weight <= 0.0:
|
||||
return 0.0
|
||||
|
||||
total = pos_weight + neg_weight
|
||||
|
||||
# Compute weight fractions (Req 15.2)
|
||||
f_pos = pos_weight / total
|
||||
f_neg = neg_weight / total # = 1 - f_pos
|
||||
|
||||
# Shannon entropy H = -f_pos·log₂(f_pos) - f_neg·log₂(f_neg) (Req 15.3)
|
||||
# Guard against log₂(0) — already handled by the early return above
|
||||
h_contradiction = -f_pos * math.log2(f_pos) - f_neg * math.log2(f_neg)
|
||||
|
||||
# Weight by evidence mass (Req 15.4)
|
||||
evidence_factor = min(1.0, total / w_threshold) if w_threshold > 0.0 else 1.0
|
||||
score = h_contradiction * evidence_factor
|
||||
|
||||
return round(score, 4)
|
||||
|
||||
|
||||
def _detect_sentiment_disagreement(
|
||||
signals: list[WeightedSignal],
|
||||
) -> DisagreementDetail | None:
|
||||
|
||||
@@ -283,27 +283,82 @@ def _determine_impact_direction(
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _compute_multiplicative_exposure(
|
||||
geo_overlap: float,
|
||||
supply_overlap: float,
|
||||
commodity_overlap: float,
|
||||
sector_match: float,
|
||||
) -> float:
|
||||
"""Compute multiplicative compounding exposure.
|
||||
|
||||
Formula: 1 - Π_k(1 - w_k · O_k)
|
||||
|
||||
Multi-dimensional exposure compounds — a company exposed across
|
||||
multiple dimensions receives higher impact than simple addition.
|
||||
|
||||
Returns a value in [0, ~0.724] (max when all overlaps are 1.0).
|
||||
|
||||
Requirements: 10.1, 10.4, 10.7
|
||||
"""
|
||||
product = (
|
||||
(1.0 - GEO_WEIGHT * geo_overlap)
|
||||
* (1.0 - SUPPLY_WEIGHT * supply_overlap)
|
||||
* (1.0 - COMMODITY_WEIGHT * commodity_overlap)
|
||||
* (1.0 - SECTOR_WEIGHT * sector_match)
|
||||
)
|
||||
return 1.0 - product
|
||||
|
||||
|
||||
def _compute_linear_exposure(
|
||||
geo_overlap: float,
|
||||
supply_overlap: float,
|
||||
commodity_overlap: float,
|
||||
sector_match: float,
|
||||
) -> float:
|
||||
"""Compute linear weighted-sum exposure (original heuristic formula).
|
||||
|
||||
Formula: w_geo·O_geo + w_supply·O_supply + w_commodity·O_commodity + w_sector·O_sector
|
||||
|
||||
Returns a value in [0, 1].
|
||||
"""
|
||||
return (
|
||||
GEO_WEIGHT * geo_overlap
|
||||
+ SUPPLY_WEIGHT * supply_overlap
|
||||
+ COMMODITY_WEIGHT * commodity_overlap
|
||||
+ SECTOR_WEIGHT * sector_match
|
||||
)
|
||||
|
||||
|
||||
def compute_macro_impact(
|
||||
event: GlobalEvent,
|
||||
profile: ExposureProfileSchema,
|
||||
*,
|
||||
probabilistic: bool = False,
|
||||
) -> MacroImpactRecord:
|
||||
"""Compute the macro impact of a global event on a company.
|
||||
|
||||
Scoring formula:
|
||||
When ``probabilistic=False`` (default), uses the linear weighted-sum:
|
||||
raw_score = severity_weight * (
|
||||
0.35 * geographic_overlap +
|
||||
0.25 * supply_chain_overlap +
|
||||
0.25 * commodity_overlap +
|
||||
0.15 * sector_match
|
||||
)
|
||||
final_score = apply_resilience_modifier(raw_score, tier, is_international)
|
||||
|
||||
When ``probabilistic=True``, uses multiplicative compounding exposure:
|
||||
raw_score = severity_weight * (1 - Π_k(1 - w_k · O_k))
|
||||
|
||||
In both modes, the resilience modifier is applied after the raw score.
|
||||
|
||||
Args:
|
||||
event: The classified global event.
|
||||
profile: The company's exposure profile.
|
||||
probabilistic: Use multiplicative formula when True.
|
||||
|
||||
Returns:
|
||||
A MacroImpactRecord with the computed score and metadata.
|
||||
|
||||
Requirements: 10.1, 10.2, 10.3, 10.4, 10.5, 10.6
|
||||
"""
|
||||
now = datetime.now(timezone.utc)
|
||||
|
||||
@@ -360,13 +415,16 @@ def compute_macro_impact(
|
||||
# Severity weight
|
||||
severity_weight = SEVERITY_WEIGHTS.get(event.severity, 0.25)
|
||||
|
||||
# Raw score
|
||||
raw_score = severity_weight * (
|
||||
GEO_WEIGHT * geo_overlap
|
||||
+ SUPPLY_WEIGHT * supply_overlap
|
||||
+ COMMODITY_WEIGHT * commodity_overlap
|
||||
+ SECTOR_WEIGHT * sector_match
|
||||
)
|
||||
# Raw score: multiplicative or linear depending on mode
|
||||
if probabilistic:
|
||||
exposure = _compute_multiplicative_exposure(
|
||||
geo_overlap, supply_overlap, commodity_overlap, sector_match,
|
||||
)
|
||||
else:
|
||||
exposure = _compute_linear_exposure(
|
||||
geo_overlap, supply_overlap, commodity_overlap, sector_match,
|
||||
)
|
||||
raw_score = severity_weight * exposure
|
||||
|
||||
# Determine if event is international (affects multiple regions)
|
||||
is_international = len(event.affected_regions) > 1
|
||||
@@ -406,19 +464,27 @@ def compute_macro_impact_with_sector(
|
||||
event: GlobalEvent,
|
||||
profile: ExposureProfileSchema,
|
||||
company_sector: str = "",
|
||||
*,
|
||||
probabilistic: bool = False,
|
||||
) -> MacroImpactRecord:
|
||||
"""Compute macro impact with explicit sector matching.
|
||||
|
||||
Like compute_macro_impact but accepts a company_sector parameter
|
||||
for proper sector_match computation.
|
||||
|
||||
When ``probabilistic=True``, uses multiplicative compounding exposure.
|
||||
When ``probabilistic=False``, uses the original linear weighted sum.
|
||||
|
||||
Args:
|
||||
event: The classified global event.
|
||||
profile: The company's exposure profile.
|
||||
company_sector: The company's GICS sector name.
|
||||
probabilistic: Use multiplicative formula when True.
|
||||
|
||||
Returns:
|
||||
A MacroImpactRecord with the computed score and metadata.
|
||||
|
||||
Requirements: 10.1, 10.2, 10.3, 10.4, 10.5, 10.6
|
||||
"""
|
||||
now = datetime.now(timezone.utc)
|
||||
|
||||
@@ -472,13 +538,16 @@ def compute_macro_impact_with_sector(
|
||||
# Severity weight
|
||||
severity_weight = SEVERITY_WEIGHTS.get(event.severity, 0.25)
|
||||
|
||||
# Raw score
|
||||
raw_score = severity_weight * (
|
||||
GEO_WEIGHT * geo_overlap
|
||||
+ SUPPLY_WEIGHT * supply_overlap
|
||||
+ COMMODITY_WEIGHT * commodity_overlap
|
||||
+ SECTOR_WEIGHT * sector_match
|
||||
)
|
||||
# Raw score: multiplicative or linear depending on mode
|
||||
if probabilistic:
|
||||
exposure = _compute_multiplicative_exposure(
|
||||
geo_overlap, supply_overlap, commodity_overlap, sector_match,
|
||||
)
|
||||
else:
|
||||
exposure = _compute_linear_exposure(
|
||||
geo_overlap, supply_overlap, commodity_overlap, sector_match,
|
||||
)
|
||||
raw_score = severity_weight * exposure
|
||||
|
||||
# International check
|
||||
is_international = len(event.affected_regions) > 1
|
||||
@@ -588,6 +657,154 @@ def _infer_commodities(sector: str, industry: str) -> list[str]:
|
||||
return sector_commodities.get(sector, [])
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Conditional macro signal integration (Requirements: 11.1–11.5)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def compute_conditional_macro_modifier(
|
||||
company_strength: float,
|
||||
company_direction: str,
|
||||
macro_impact: float,
|
||||
macro_direction: str,
|
||||
) -> float:
|
||||
"""Compute the multiplicative macro modifier for conditional integration.
|
||||
|
||||
When both company and macro signals exist, macro acts as a modifier:
|
||||
S_adjusted = S_company · clamp(1 + M_macro · sign_alignment, 0.5, 1.5)
|
||||
|
||||
sign_alignment is +1 when macro and company agree in direction,
|
||||
-1 when they disagree.
|
||||
|
||||
Args:
|
||||
company_strength: The company-level signal strength (absolute).
|
||||
company_direction: Company trend direction (bullish/bearish/neutral/mixed).
|
||||
macro_impact: Normalized macro impact score in [0, 1].
|
||||
macro_direction: Macro impact direction (positive/negative/mixed/neutral).
|
||||
|
||||
Returns:
|
||||
The multiplicative modifier in [0.5, 1.5].
|
||||
|
||||
Requirements: 11.1, 11.2
|
||||
"""
|
||||
# Determine sign alignment between company and macro directions
|
||||
_DIRECTION_SIGN = {
|
||||
"bullish": 1,
|
||||
"positive": 1,
|
||||
"bearish": -1,
|
||||
"negative": -1,
|
||||
}
|
||||
company_sign = _DIRECTION_SIGN.get(company_direction, 0)
|
||||
macro_sign = _DIRECTION_SIGN.get(macro_direction, 0)
|
||||
|
||||
if company_sign == 0 or macro_sign == 0:
|
||||
# Neutral or mixed directions — no alignment signal
|
||||
sign_alignment = 0.0
|
||||
elif company_sign == macro_sign:
|
||||
sign_alignment = 1.0
|
||||
else:
|
||||
sign_alignment = -1.0
|
||||
|
||||
raw_modifier = 1.0 + macro_impact * sign_alignment
|
||||
return max(0.5, min(1.5, raw_modifier))
|
||||
|
||||
|
||||
def integrate_macro_signals(
|
||||
company_signals: list,
|
||||
macro_signals: list,
|
||||
company_direction: str,
|
||||
macro_impacts: list,
|
||||
ticker: str = "",
|
||||
*,
|
||||
probabilistic: bool = False,
|
||||
macro_signal_weight: float = 0.3,
|
||||
) -> tuple[list, float]:
|
||||
"""Integrate macro signals with company signals.
|
||||
|
||||
When ``probabilistic=True``:
|
||||
- Both exist: apply macro as multiplicative modifier on company signals
|
||||
- Only macro: fall back to additive behavior with weight 0.3
|
||||
- Only company: use modifier = 1.0 (no change)
|
||||
|
||||
When ``probabilistic=False``:
|
||||
- Preserve current additive merge behavior (concatenate lists)
|
||||
|
||||
Args:
|
||||
company_signals: WeightedSignal list from company layer.
|
||||
macro_signals: WeightedSignal list from macro layer.
|
||||
company_direction: Derived company trend direction string.
|
||||
macro_impacts: List of MacroImpactRecord or similar with
|
||||
macro_impact_score and impact_direction attributes.
|
||||
ticker: Ticker symbol for logging.
|
||||
probabilistic: Use conditional modifier when True.
|
||||
macro_signal_weight: Weight for macro-only fallback (default 0.3).
|
||||
|
||||
Returns:
|
||||
Tuple of (merged_signals, macro_modifier_applied).
|
||||
macro_modifier_applied is 1.0 when no modifier was used.
|
||||
|
||||
Requirements: 11.1, 11.2, 11.3, 11.4, 11.5
|
||||
"""
|
||||
if not probabilistic:
|
||||
# Heuristic mode: simple additive merge (current behavior)
|
||||
merged = list(company_signals) + list(macro_signals)
|
||||
return merged, 1.0
|
||||
|
||||
has_company = len(company_signals) > 0
|
||||
has_macro = len(macro_signals) > 0
|
||||
|
||||
if has_company and has_macro:
|
||||
# Compute average macro impact and dominant direction
|
||||
avg_macro_impact = 0.0
|
||||
direction_counts: dict[str, float] = {}
|
||||
for mir in macro_impacts:
|
||||
score = getattr(mir, "macro_impact_score", 0.0)
|
||||
direction = getattr(mir, "impact_direction", "neutral")
|
||||
avg_macro_impact += score
|
||||
direction_counts[direction] = direction_counts.get(direction, 0.0) + score
|
||||
|
||||
if macro_impacts:
|
||||
avg_macro_impact /= len(macro_impacts)
|
||||
|
||||
# Dominant macro direction by total impact weight
|
||||
macro_direction = max(direction_counts, key=direction_counts.get) if direction_counts else "neutral"
|
||||
|
||||
modifier = compute_conditional_macro_modifier(
|
||||
company_strength=0.0, # not used in current formula
|
||||
company_direction=company_direction,
|
||||
macro_impact=avg_macro_impact,
|
||||
macro_direction=macro_direction,
|
||||
)
|
||||
|
||||
logger.info(
|
||||
"Macro modifier for %s: %.4f (avg_impact=%.4f, macro_dir=%s, company_dir=%s)",
|
||||
ticker, modifier, avg_macro_impact, macro_direction, company_direction,
|
||||
)
|
||||
|
||||
# Apply modifier to company signals by scaling their impact scores
|
||||
# We create modified copies rather than mutating originals
|
||||
from copy import copy
|
||||
modified_signals = []
|
||||
for sig in company_signals:
|
||||
new_sig = copy(sig)
|
||||
new_sig.impact_score = sig.impact_score * modifier
|
||||
modified_signals.append(new_sig)
|
||||
|
||||
return modified_signals, modifier
|
||||
|
||||
if has_macro and not has_company:
|
||||
# Macro-only fallback: additive behavior with weight 0.3 (Req 11.3)
|
||||
logger.info(
|
||||
"Macro-only fallback for %s: using additive merge with weight %.2f",
|
||||
ticker, macro_signal_weight,
|
||||
)
|
||||
return list(macro_signals), 1.0
|
||||
|
||||
# Company-only: no modification (Req 11.4)
|
||||
logger.info("Company-only signals for %s: macro modifier=1.0", ticker)
|
||||
return list(company_signals), 1.0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# PostgreSQL persistence
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@@ -4,7 +4,7 @@ Computes TrendProjection objects by combining current trend momentum,
|
||||
macro signal decay trajectories, and upcoming catalyst outlook.
|
||||
Projections are persisted alongside trend_window records.
|
||||
|
||||
Requirements: 12.1, 12.2, 12.3, 12.4, 12.5, 12.9
|
||||
Requirements: 12.1, 12.2, 12.3, 12.4, 12.5, 12.9, 13.1, 13.2, 13.3, 13.4, 13.5, 13.6
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
@@ -126,6 +126,87 @@ def _direction_sign(direction: str) -> float:
|
||||
return 0.0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Exponentially weighted momentum (Requirements: 13.1–13.6)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def compute_ew_momentum(
|
||||
strength_changes: list[float],
|
||||
lambda_decay: float = 0.7,
|
||||
) -> float:
|
||||
"""Compute exponentially weighted momentum from historical strength changes.
|
||||
|
||||
Formula: M_t = Σ_{k=0}^{K-1} λ^k · ΔS_{t-k}
|
||||
Normalized by geometric series sum Σ λ^k to produce value in [-1, 1].
|
||||
|
||||
When fewer than 2 historical cycles are available, returns 0.0
|
||||
(caller should fall back to heuristic).
|
||||
|
||||
Args:
|
||||
strength_changes: List of signed strength changes ΔS, most recent first.
|
||||
Each value represents the change in signed trend strength from one
|
||||
cycle to the next. Positive = strengthening bullish / weakening bearish.
|
||||
lambda_decay: Decay factor λ (default 0.7). Must be in (0, 1).
|
||||
|
||||
Returns:
|
||||
Normalized momentum in [-1, 1]. Returns 0.0 for empty or single-element lists.
|
||||
|
||||
Requirements: 13.1, 13.2, 13.3, 13.6
|
||||
"""
|
||||
if len(strength_changes) < 2:
|
||||
return 0.0
|
||||
|
||||
# Use up to K=10 most recent changes, filtering out NaN values
|
||||
k_max = min(len(strength_changes), 10)
|
||||
changes = strength_changes[:k_max]
|
||||
|
||||
weighted_sum = 0.0
|
||||
weight_sum = 0.0
|
||||
for k, delta_s in enumerate(changes):
|
||||
if math.isnan(delta_s):
|
||||
continue
|
||||
w = lambda_decay ** k
|
||||
weighted_sum += w * delta_s
|
||||
weight_sum += w
|
||||
|
||||
if weight_sum == 0.0:
|
||||
return 0.0
|
||||
|
||||
normalized = weighted_sum / weight_sum
|
||||
# Guard against NaN propagation
|
||||
if math.isnan(normalized) or math.isinf(normalized):
|
||||
return 0.0
|
||||
return max(-1.0, min(1.0, normalized))
|
||||
|
||||
|
||||
def compute_volatility_scaled_momentum(
|
||||
momentum: float,
|
||||
sigma_20: float,
|
||||
) -> float:
|
||||
"""Compute volatility-scaled momentum.
|
||||
|
||||
Formula: M_adj = M_t / max(σ_20, 0.01), clamped to [-2.0, 2.0].
|
||||
|
||||
Normalizes momentum relative to the ticker's typical price movement.
|
||||
|
||||
Args:
|
||||
momentum: Raw or EW momentum value.
|
||||
sigma_20: 20-day return standard deviation.
|
||||
|
||||
Returns:
|
||||
Volatility-scaled momentum in [-2.0, 2.0].
|
||||
|
||||
Requirements: 13.4, 13.5
|
||||
"""
|
||||
denominator = max(sigma_20, 0.01)
|
||||
scaled = momentum / denominator
|
||||
# Guard against NaN propagation
|
||||
if math.isnan(scaled) or math.isinf(scaled):
|
||||
return 0.0
|
||||
return max(-2.0, min(2.0, scaled))
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Macro signal decay projection
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@@ -0,0 +1,170 @@
|
||||
"""Regime detector for market regime classification.
|
||||
|
||||
Classifies the current market regime for each ticker based on
|
||||
EMA trend indicators and volatility ratios. Adjusts scoring
|
||||
thresholds and contradiction penalties per regime.
|
||||
|
||||
Requirements: 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.9
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import math
|
||||
import statistics
|
||||
from dataclasses import dataclass
|
||||
from enum import Enum
|
||||
|
||||
|
||||
class MarketRegime(str, Enum):
|
||||
"""Market regime classification categories."""
|
||||
|
||||
TREND_FOLLOWING = "trend_following"
|
||||
PANIC = "panic"
|
||||
MEAN_REVERSION = "mean_reversion"
|
||||
UNCERTAINTY = "uncertainty"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RegimeClassification:
|
||||
"""Result of regime detection for a ticker."""
|
||||
|
||||
regime: MarketRegime
|
||||
trend_indicator: float # R = sign(EMA_20 - EMA_100)
|
||||
volatility_ratio: float # V_r = σ_20 / σ_100
|
||||
bullish_threshold: float # Adjusted ±threshold for direction
|
||||
bearish_threshold: float
|
||||
contradiction_penalty_multiplier: float # 0.4 default, 0.6 for uncertainty
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RegimeConfig:
|
||||
"""Configuration parameters for regime detection."""
|
||||
|
||||
ema_short_period: int = 20
|
||||
ema_long_period: int = 100
|
||||
vol_short_period: int = 20
|
||||
vol_long_period: int = 100
|
||||
panic_vol_ratio: float = 1.5
|
||||
trend_vol_ratio: float = 1.2
|
||||
mean_reversion_vol_ratio: float = 1.0
|
||||
default_threshold: float = 0.15
|
||||
panic_threshold: float = 0.10
|
||||
mean_reversion_threshold: float = 0.20
|
||||
uncertainty_contradiction_multiplier: float = 0.6
|
||||
|
||||
|
||||
# Default uncertainty classification used when data is insufficient
|
||||
_DEFAULT_UNCERTAINTY = RegimeClassification(
|
||||
regime=MarketRegime.UNCERTAINTY,
|
||||
trend_indicator=0.0,
|
||||
volatility_ratio=1.0,
|
||||
bullish_threshold=0.15,
|
||||
bearish_threshold=-0.15,
|
||||
contradiction_penalty_multiplier=0.6,
|
||||
)
|
||||
|
||||
|
||||
def compute_ema(values: list[float], period: int) -> float:
|
||||
"""Compute exponential moving average over the last ``period`` values.
|
||||
|
||||
Uses the standard EMA formula with multiplier = 2 / (period + 1).
|
||||
Iterates through the values, seeding the EMA with the first value.
|
||||
|
||||
Raises ``ValueError`` when *values* is empty or *period* < 1.
|
||||
"""
|
||||
if not values or period < 1:
|
||||
raise ValueError("values must be non-empty and period must be >= 1")
|
||||
|
||||
# Use only the last `period` values (or all if fewer)
|
||||
data = values[-period:] if len(values) >= period else values
|
||||
|
||||
multiplier = 2.0 / (period + 1)
|
||||
ema = data[0]
|
||||
for value in data[1:]:
|
||||
ema = (value - ema) * multiplier + ema
|
||||
return ema
|
||||
|
||||
|
||||
def _sign(x: float) -> float:
|
||||
"""Return -1.0, 0.0, or 1.0 for the sign of *x*."""
|
||||
if x > 0.0:
|
||||
return 1.0
|
||||
if x < 0.0:
|
||||
return -1.0
|
||||
return 0.0
|
||||
|
||||
|
||||
def classify_regime(
|
||||
closing_prices: list[float],
|
||||
returns: list[float],
|
||||
config: RegimeConfig = RegimeConfig(),
|
||||
) -> RegimeClassification:
|
||||
"""Classify market regime from price and return history.
|
||||
|
||||
Requires at least ``config.ema_long_period`` days of price history
|
||||
for EMA_100. Falls back to UNCERTAINTY when data is insufficient
|
||||
or standard deviations are zero.
|
||||
|
||||
Requirements: 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.9
|
||||
"""
|
||||
# Insufficient price data → uncertainty
|
||||
if len(closing_prices) < config.ema_long_period:
|
||||
return _DEFAULT_UNCERTAINTY
|
||||
|
||||
# Insufficient return data → uncertainty
|
||||
if len(returns) < config.vol_long_period:
|
||||
return _DEFAULT_UNCERTAINTY
|
||||
|
||||
# --- Trend indicator: R = sign(EMA_short - EMA_long) ---
|
||||
ema_short = compute_ema(closing_prices, config.ema_short_period)
|
||||
ema_long = compute_ema(closing_prices, config.ema_long_period)
|
||||
trend_indicator = _sign(ema_short - ema_long)
|
||||
|
||||
# --- Volatility ratio: V_r = σ_short / σ_long ---
|
||||
short_returns = returns[-config.vol_short_period:]
|
||||
long_returns = returns[-config.vol_long_period:]
|
||||
|
||||
# Guard against zero or near-zero standard deviations
|
||||
if len(short_returns) < 2 or len(long_returns) < 2:
|
||||
return _DEFAULT_UNCERTAINTY
|
||||
|
||||
sigma_short = statistics.stdev(short_returns)
|
||||
sigma_long = statistics.stdev(long_returns)
|
||||
|
||||
if sigma_long == 0.0 or sigma_short == 0.0:
|
||||
return _DEFAULT_UNCERTAINTY
|
||||
|
||||
if math.isnan(sigma_short) or math.isnan(sigma_long):
|
||||
return _DEFAULT_UNCERTAINTY
|
||||
|
||||
volatility_ratio = sigma_short / sigma_long
|
||||
|
||||
# --- Classification rules (Req 7.3) ---
|
||||
# Panic takes priority: V_r > 1.5
|
||||
if volatility_ratio > config.panic_vol_ratio:
|
||||
regime = MarketRegime.PANIC
|
||||
threshold = config.panic_threshold # ±0.10
|
||||
contradiction_mult = 0.4
|
||||
# Trend-following: R ≠ 0 AND V_r < 1.2
|
||||
elif trend_indicator != 0.0 and volatility_ratio < config.trend_vol_ratio:
|
||||
regime = MarketRegime.TREND_FOLLOWING
|
||||
threshold = config.default_threshold # ±0.15
|
||||
contradiction_mult = 0.4
|
||||
# Mean-reversion: R = 0 AND V_r < 1.0
|
||||
elif trend_indicator == 0.0 and volatility_ratio < config.mean_reversion_vol_ratio:
|
||||
regime = MarketRegime.MEAN_REVERSION
|
||||
threshold = config.mean_reversion_threshold # ±0.20
|
||||
contradiction_mult = 0.4
|
||||
# Uncertainty: all other cases
|
||||
else:
|
||||
regime = MarketRegime.UNCERTAINTY
|
||||
threshold = config.default_threshold # ±0.15
|
||||
contradiction_mult = config.uncertainty_contradiction_multiplier # 0.6
|
||||
|
||||
return RegimeClassification(
|
||||
regime=regime,
|
||||
trend_indicator=trend_indicator,
|
||||
volatility_ratio=volatility_ratio,
|
||||
bullish_threshold=threshold,
|
||||
bearish_threshold=-threshold,
|
||||
contradiction_penalty_multiplier=contradiction_mult,
|
||||
)
|
||||
+322
-16
@@ -4,7 +4,7 @@ integration for aggregation.
|
||||
Provides scoring functions used by the aggregation engine to weight
|
||||
document intelligence signals when computing trend summaries.
|
||||
|
||||
Requirements: 6.1, 6.2, 6.5
|
||||
Requirements: 2.1–2.6, 3.1–3.5, 4.2–4.3, 5.1–5.7, 6.1–6.5, 16.4–16.5
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
@@ -14,6 +14,24 @@ from datetime import datetime, timezone
|
||||
|
||||
from services.shared.schemas import MarketContext
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Event type base rates for information gain computation (Req 3.1)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
EVENT_TYPE_BASE_RATES: dict[str, float] = {
|
||||
"earnings": 0.25,
|
||||
"product_launch": 0.10,
|
||||
"regulatory": 0.08,
|
||||
"legal": 0.05,
|
||||
"m_and_a": 0.03,
|
||||
"management_change": 0.06,
|
||||
"partnership": 0.12,
|
||||
"market_expansion": 0.09,
|
||||
"restructuring": 0.04,
|
||||
"dividend": 0.15,
|
||||
}
|
||||
DEFAULT_BASE_RATE = 0.1
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ScoringConfig:
|
||||
@@ -62,6 +80,37 @@ class ScoringConfig:
|
||||
volume_surge_threshold_pct: float = 50.0
|
||||
volume_surge_boost: float = 0.15
|
||||
|
||||
# --- Probabilistic scoring parameters ---
|
||||
|
||||
# Toggle: when True, use probabilistic formulas (sigmoid gate,
|
||||
# adaptive decay, info gain, regime multiplier, source accuracy).
|
||||
# When False, preserve exact current heuristic behaviour.
|
||||
probabilistic: bool = False
|
||||
|
||||
# Sigmoid gate parameters — smooth replacement for binary confidence gate.
|
||||
# Gate value: σ(k·(x - midpoint)) where k = steepness.
|
||||
sigmoid_steepness: float = 5.0
|
||||
sigmoid_midpoint: float = 0.5
|
||||
|
||||
# Information gain parameters — surprise weighting for rare events.
|
||||
# r = 1 + λ·(-log₂ P(event_type)), clamped to info_gain_max.
|
||||
info_gain_lambda: float = 0.3
|
||||
info_gain_max: float = 3.0
|
||||
default_base_rate: float = 0.1
|
||||
|
||||
# Adaptive decay parameters — β scaling factors for event-specific
|
||||
# half-life adjustment: τ_i = τ_base · (1+β_impact)·(1+β_surprise)·(1+β_market).
|
||||
adaptive_decay_impact_scale: float = 1.0
|
||||
adaptive_decay_surprise_scale: float = 1.0
|
||||
adaptive_decay_market_scale: float = 0.5
|
||||
|
||||
# Regime multiplier parameters — replaces market context multiplier.
|
||||
# M_regime = 1 + regime_return_weight·|z_r| + regime_volume_weight·|z_v|,
|
||||
# clamped to [1.0, regime_multiplier_max].
|
||||
regime_return_weight: float = 0.15
|
||||
regime_volume_weight: float = 0.10
|
||||
regime_multiplier_max: float = 2.5
|
||||
|
||||
|
||||
# Singleton default config
|
||||
DEFAULT_CONFIG = ScoringConfig()
|
||||
@@ -77,6 +126,8 @@ def recency_weight(
|
||||
reference_time: datetime,
|
||||
window: str,
|
||||
config: ScoringConfig = DEFAULT_CONFIG,
|
||||
*,
|
||||
half_life_override: float | None = None,
|
||||
) -> float:
|
||||
"""Compute an exponential recency decay weight for a document.
|
||||
|
||||
@@ -87,6 +138,8 @@ def recency_weight(
|
||||
reference_time: The "now" anchor for the aggregation window (tz-aware).
|
||||
window: One of the TrendWindow values (e.g. "7d").
|
||||
config: Scoring parameters.
|
||||
half_life_override: If provided, use this half-life instead of the
|
||||
window-based default (used for adaptive decay).
|
||||
|
||||
Returns:
|
||||
A weight in [config.min_recency_weight, 1.0].
|
||||
@@ -102,7 +155,7 @@ def recency_weight(
|
||||
return 1.0
|
||||
|
||||
age_hours = age_seconds / 3600.0
|
||||
half_life = config.half_life_hours.get(window, 72.0)
|
||||
half_life = half_life_override if half_life_override is not None else config.half_life_hours.get(window, 72.0)
|
||||
|
||||
weight = math.pow(2.0, -age_hours / half_life)
|
||||
return max(weight, config.min_recency_weight)
|
||||
@@ -170,6 +223,188 @@ def market_context_multiplier(
|
||||
return 1.0 + boost
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Sigmoid confidence gate (Req 2.1–2.6)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def sigmoid_gate(
|
||||
x: float,
|
||||
steepness: float = 5.0,
|
||||
midpoint: float = 0.5,
|
||||
) -> float:
|
||||
"""Smooth sigmoid confidence gate: σ(k·(x - midpoint)).
|
||||
|
||||
Replaces the binary 0/1 confidence gate in probabilistic mode.
|
||||
Returns a value in (0, 1) — higher confidence produces higher gate.
|
||||
|
||||
Args:
|
||||
x: Extraction confidence value, typically in [0, 1].
|
||||
steepness: Steepness parameter k (default 5.0).
|
||||
midpoint: Midpoint of the sigmoid transition (default 0.5).
|
||||
|
||||
Returns:
|
||||
Gate value in (0, 1).
|
||||
"""
|
||||
z = steepness * (x - midpoint)
|
||||
# Guard against overflow in exp for very negative z
|
||||
if z < -500.0:
|
||||
return 0.0
|
||||
if z > 500.0:
|
||||
return 1.0
|
||||
return 1.0 / (1.0 + math.exp(-z))
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Information gain surprise weighting (Req 3.1–3.5)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def compute_info_gain(
|
||||
event_type: str | None,
|
||||
lambda_param: float = 0.3,
|
||||
max_gain: float = 3.0,
|
||||
default_base_rate: float = 0.1,
|
||||
) -> float:
|
||||
"""Compute information gain factor for an event type.
|
||||
|
||||
Formula: r = 1 + λ·(-log₂ P(event_type)), clamped to [1.0, max_gain].
|
||||
|
||||
Rarer events produce higher surprise weight. Unknown event types
|
||||
use the default base rate.
|
||||
|
||||
Args:
|
||||
event_type: Event type string (e.g. "earnings", "m_and_a").
|
||||
lambda_param: Scaling parameter λ (default 0.3).
|
||||
max_gain: Maximum clamp for the info gain factor (default 3.0).
|
||||
default_base_rate: Fallback base rate for unknown event types.
|
||||
|
||||
Returns:
|
||||
Information gain factor r in [1.0, max_gain].
|
||||
"""
|
||||
if event_type is None:
|
||||
return 1.0
|
||||
|
||||
base_rate = EVENT_TYPE_BASE_RATES.get(event_type, default_base_rate)
|
||||
# Guard against log₂(0) — base rates must be > 0
|
||||
if base_rate <= 0.0:
|
||||
base_rate = default_base_rate
|
||||
if base_rate <= 0.0:
|
||||
return 1.0
|
||||
|
||||
surprise = -math.log2(base_rate)
|
||||
r = 1.0 + lambda_param * surprise
|
||||
return min(max(r, 1.0), max_gain)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Adaptive recency decay (Req 5.1–5.7)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def compute_adaptive_half_life(
|
||||
base_half_life: float,
|
||||
impact_score: float,
|
||||
info_gain_factor: float,
|
||||
market_multiplier: float,
|
||||
config: ScoringConfig,
|
||||
) -> float:
|
||||
"""Compute adaptive half-life for event-specific recency decay.
|
||||
|
||||
Formula: τ_i = τ_base · (1 + β_impact) · (1 + β_surprise) · (1 + β_market)
|
||||
|
||||
The adaptive half-life is always >= base_half_life (decay is never faster).
|
||||
|
||||
Args:
|
||||
base_half_life: Fixed half-life for the window (hours).
|
||||
impact_score: Signal impact score in [0, 1].
|
||||
info_gain_factor: Information gain factor r in [1.0, 3.0].
|
||||
market_multiplier: Market context/regime multiplier in [1.0, ~2.5].
|
||||
config: Scoring config with adaptive decay scale parameters.
|
||||
|
||||
Returns:
|
||||
Adaptive half-life in hours, >= base_half_life.
|
||||
"""
|
||||
# β_impact: impact_score scaled linearly 0→0, 1→adaptive_decay_impact_scale
|
||||
beta_impact = impact_score * config.adaptive_decay_impact_scale
|
||||
|
||||
# β_surprise: info_gain_factor scaled linearly r=1→0, r=3→adaptive_decay_surprise_scale
|
||||
beta_surprise = ((info_gain_factor - 1.0) / 2.0) * config.adaptive_decay_surprise_scale
|
||||
|
||||
# β_market: market_multiplier scaled linearly 1.0→0, 1.45→adaptive_decay_market_scale
|
||||
if market_multiplier > 1.0:
|
||||
beta_market = ((market_multiplier - 1.0) / 0.45) * config.adaptive_decay_market_scale
|
||||
else:
|
||||
beta_market = 0.0
|
||||
|
||||
tau = base_half_life * (1.0 + beta_impact) * (1.0 + beta_surprise) * (1.0 + beta_market)
|
||||
# Ensure adaptive half-life is never less than base (Property 5)
|
||||
return max(tau, base_half_life)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Regime multiplier (Req 6.1–6.5)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def compute_regime_multiplier(
|
||||
returns: list[float] | None,
|
||||
volumes: list[float] | None,
|
||||
config: ScoringConfig = DEFAULT_CONFIG,
|
||||
) -> float:
|
||||
"""Compute regime-aware multiplier from return and volume z-scores.
|
||||
|
||||
Formula: M_regime = 1 + 0.15·|z_r| + 0.10·|z_v|, clamped to [1.0, max].
|
||||
|
||||
Args:
|
||||
returns: List of recent daily returns (at least 20 values for z-score).
|
||||
volumes: List of recent daily volumes (at least 20 values for z-score).
|
||||
config: Scoring config with regime multiplier parameters.
|
||||
|
||||
Returns:
|
||||
Regime multiplier in [1.0, config.regime_multiplier_max].
|
||||
"""
|
||||
if not returns or len(returns) < 2:
|
||||
return 1.0
|
||||
|
||||
# Filter out NaN values from returns
|
||||
clean_returns = [r for r in returns if not math.isnan(r)]
|
||||
if len(clean_returns) < 2:
|
||||
return 1.0
|
||||
|
||||
# Return z-score: z_r = (r_t - μ_20) / σ_20
|
||||
r_window = clean_returns[-20:] if len(clean_returns) >= 20 else clean_returns
|
||||
r_t = clean_returns[-1]
|
||||
mu_r = sum(r_window) / len(r_window)
|
||||
var_r = sum((x - mu_r) ** 2 for x in r_window) / len(r_window)
|
||||
sigma_r = math.sqrt(var_r)
|
||||
|
||||
z_r = 0.0
|
||||
if sigma_r > 0.0:
|
||||
z_r = (r_t - mu_r) / sigma_r
|
||||
|
||||
# Volume z-score: z_v = (log(V_t) - μ_V) / σ_V
|
||||
z_v = 0.0
|
||||
if volumes and len(volumes) >= 2:
|
||||
clean_volumes = [v for v in volumes if not math.isnan(v)]
|
||||
if len(clean_volumes) >= 2:
|
||||
v_window = clean_volumes[-20:] if len(clean_volumes) >= 20 else clean_volumes
|
||||
# Use log-volumes, guard against zero/negative volumes
|
||||
log_vols = [math.log(max(v, 1.0)) for v in v_window]
|
||||
log_v_t = math.log(max(clean_volumes[-1], 1.0))
|
||||
mu_v = sum(log_vols) / len(log_vols)
|
||||
var_v = sum((x - mu_v) ** 2 for x in log_vols) / len(log_vols)
|
||||
sigma_v = math.sqrt(var_v)
|
||||
if sigma_v > 0.0:
|
||||
z_v = (log_v_t - mu_v) / sigma_v
|
||||
|
||||
m_regime = 1.0 + config.regime_return_weight * abs(z_r) + config.regime_volume_weight * abs(z_v)
|
||||
# Guard against NaN propagation from upstream data
|
||||
if math.isnan(m_regime) or math.isinf(m_regime):
|
||||
return 1.0
|
||||
return max(1.0, min(m_regime, config.regime_multiplier_max))
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Combined document signal weight
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -186,6 +421,12 @@ class SignalWeight:
|
||||
market_ctx_multiplier: float # >= 1.0
|
||||
combined: float
|
||||
|
||||
# New optional fields for probabilistic mode
|
||||
sigmoid_gate: float | None = None # Smooth gate value [0, 1]
|
||||
info_gain_factor: float = 1.0 # Surprise multiplier
|
||||
source_accuracy_factor: float = 1.0 # Historical accuracy multiplier
|
||||
regime_multiplier: float | None = None # M_regime replacing M_context
|
||||
|
||||
|
||||
def compute_signal_weight(
|
||||
published_at: datetime,
|
||||
@@ -196,18 +437,23 @@ def compute_signal_weight(
|
||||
extraction_confidence: float = 0.5,
|
||||
market_ctx: MarketContext | None = None,
|
||||
config: ScoringConfig = DEFAULT_CONFIG,
|
||||
*,
|
||||
event_type: str | None = None,
|
||||
impact_score: float = 0.5,
|
||||
source_accuracy_factor: float = 1.0,
|
||||
returns: list[float] | None = None,
|
||||
volumes: list[float] | None = None,
|
||||
) -> SignalWeight:
|
||||
"""Compute the combined aggregation weight for a single document signal.
|
||||
|
||||
The formula is:
|
||||
When ``config.probabilistic`` is False (default), the formula is:
|
||||
combined = confidence_gate * recency * credibility
|
||||
* (1 + novelty_bonus) * market_ctx_multiplier
|
||||
|
||||
where novelty_bonus = novelty_score * config.novelty_bonus_max
|
||||
and market_ctx_multiplier >= 1.0 based on volatility/volume features.
|
||||
|
||||
Documents with extraction_confidence below config.confidence_floor
|
||||
receive a combined weight of 0.0 (gated out).
|
||||
When ``config.probabilistic`` is True, the formula is:
|
||||
combined = sigmoid_gate * recency(adaptive) * credibility
|
||||
* (1 + novelty_bonus) * info_gain * source_accuracy
|
||||
* regime_multiplier
|
||||
|
||||
Args:
|
||||
published_at: Document publication time.
|
||||
@@ -218,27 +464,82 @@ def compute_signal_weight(
|
||||
extraction_confidence: Extraction confidence from the model (0-1).
|
||||
market_ctx: Optional market context features for the symbol.
|
||||
config: Scoring parameters.
|
||||
event_type: Optional event type for information gain computation.
|
||||
impact_score: Signal impact score in [0, 1] (default 0.5).
|
||||
source_accuracy_factor: Historical source accuracy factor (default 1.0).
|
||||
returns: Optional list of recent daily returns for regime multiplier.
|
||||
volumes: Optional list of recent daily volumes for regime multiplier.
|
||||
|
||||
Returns:
|
||||
A ``SignalWeight`` with the component breakdown and combined score.
|
||||
"""
|
||||
# Confidence gate
|
||||
gate = 1.0 if extraction_confidence >= config.confidence_floor else 0.0
|
||||
|
||||
rec = recency_weight(published_at, reference_time, window, config)
|
||||
cred = credibility_weight(source_credibility, config)
|
||||
bonus = novelty_score * config.novelty_bonus_max
|
||||
mkt_mult = market_context_multiplier(market_ctx, config)
|
||||
|
||||
combined = gate * rec * cred * (1.0 + bonus) * mkt_mult
|
||||
if not config.probabilistic:
|
||||
# --- Heuristic mode: preserve exact current formula ---
|
||||
gate = 1.0 if extraction_confidence >= config.confidence_floor else 0.0
|
||||
rec = recency_weight(published_at, reference_time, window, config)
|
||||
mkt_mult = market_context_multiplier(market_ctx, config)
|
||||
|
||||
combined = gate * rec * cred * (1.0 + bonus) * mkt_mult
|
||||
|
||||
return SignalWeight(
|
||||
recency=rec,
|
||||
credibility=cred,
|
||||
novelty_bonus=bonus,
|
||||
confidence_gate=gate,
|
||||
market_ctx_multiplier=mkt_mult,
|
||||
combined=combined,
|
||||
)
|
||||
|
||||
# --- Probabilistic mode ---
|
||||
|
||||
# 1. Sigmoid confidence gate (Req 2.1–2.5)
|
||||
sg = sigmoid_gate(extraction_confidence, config.sigmoid_steepness, config.sigmoid_midpoint)
|
||||
|
||||
# 2. Information gain factor (Req 3.1–3.5)
|
||||
ig = compute_info_gain(
|
||||
event_type,
|
||||
lambda_param=config.info_gain_lambda,
|
||||
max_gain=config.info_gain_max,
|
||||
default_base_rate=config.default_base_rate,
|
||||
)
|
||||
|
||||
# 3. Regime multiplier (Req 6.1–6.5) — replaces market_context_multiplier
|
||||
rm = compute_regime_multiplier(returns, volumes, config)
|
||||
|
||||
# 4. Adaptive recency decay (Req 5.1–5.7)
|
||||
base_half_life = config.half_life_hours.get(window, 72.0)
|
||||
adaptive_hl = compute_adaptive_half_life(
|
||||
base_half_life=base_half_life,
|
||||
impact_score=impact_score,
|
||||
info_gain_factor=ig,
|
||||
market_multiplier=rm,
|
||||
config=config,
|
||||
)
|
||||
rec = recency_weight(
|
||||
published_at, reference_time, window, config,
|
||||
half_life_override=adaptive_hl,
|
||||
)
|
||||
|
||||
# 5. Source accuracy factor (Req 4.2–4.3)
|
||||
saf = source_accuracy_factor
|
||||
|
||||
# 6. Combined weight
|
||||
combined = sg * rec * cred * (1.0 + bonus) * ig * saf * rm
|
||||
|
||||
return SignalWeight(
|
||||
recency=rec,
|
||||
credibility=cred,
|
||||
novelty_bonus=bonus,
|
||||
confidence_gate=gate,
|
||||
market_ctx_multiplier=mkt_mult,
|
||||
confidence_gate=sg, # sigmoid gate value in probabilistic mode
|
||||
market_ctx_multiplier=rm, # regime multiplier stored here for compat
|
||||
combined=combined,
|
||||
sigmoid_gate=sg,
|
||||
info_gain_factor=ig,
|
||||
source_accuracy_factor=saf,
|
||||
regime_multiplier=rm,
|
||||
)
|
||||
|
||||
|
||||
@@ -256,6 +557,11 @@ class WeightedSignal:
|
||||
sentiment_value: float # numeric sentiment: +1 positive, -1 negative, 0 neutral/mixed
|
||||
impact_score: float
|
||||
|
||||
# New optional fields for probabilistic mode
|
||||
info_gain_factor: float = 1.0 # r = 1 + λ·(-log₂ P(event_type))
|
||||
source_accuracy_factor: float = 1.0 # [0.5, 1.5] from historical accuracy
|
||||
adaptive_half_life: float | None = None # τ_i when adaptive decay is active
|
||||
|
||||
|
||||
def sentiment_to_numeric(sentiment: str) -> float:
|
||||
"""Map a sentiment label to a signed numeric value."""
|
||||
|
||||
@@ -8,11 +8,12 @@ competitive_signal_records.
|
||||
Also converts pattern and competitive signals into WeightedSignal
|
||||
objects for the aggregation engine.
|
||||
|
||||
Requirements: 4.1, 4.2, 4.3, 4.4, 4.5, 9.1
|
||||
Requirements: 4.1, 4.2, 4.3, 4.4, 4.5, 9.1, 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import math
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime, timezone
|
||||
from typing import Optional
|
||||
@@ -76,6 +77,38 @@ VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
|
||||
"""
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Graph-distance attenuation (Requirements: 12.1–12.7)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def compute_graph_distance_attenuation(
|
||||
source_strength: float,
|
||||
correlation: float,
|
||||
distance: int,
|
||||
) -> float:
|
||||
"""Compute attenuated transfer strength using graph distance.
|
||||
|
||||
Formula: S_transfer = S_source · ρ_historical · e^(-d_network)
|
||||
|
||||
Args:
|
||||
source_strength: Source signal strength S_source in [0, 1].
|
||||
correlation: Historical price correlation ρ_historical in [0, 1].
|
||||
distance: Graph distance d_network (shortest path, capped at 3).
|
||||
|
||||
Returns:
|
||||
Transfer strength, always non-negative. Returns 0.0 when
|
||||
distance exceeds 3.
|
||||
|
||||
Requirements: 12.1, 12.7
|
||||
"""
|
||||
if distance < 1:
|
||||
return 0.0
|
||||
if distance > 3:
|
||||
return 0.0
|
||||
return source_strength * correlation * math.exp(-distance)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# propagate_signals
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -87,10 +120,20 @@ async def propagate_signals(
|
||||
impact_score: float,
|
||||
document_id: str,
|
||||
config: Optional[CompetitiveConfig] = None,
|
||||
*,
|
||||
probabilistic: bool = False,
|
||||
) -> list[CompetitiveSignalRecord]:
|
||||
"""Look up competitors, query cross-company patterns, produce weighted
|
||||
competitive signals, and persist them.
|
||||
|
||||
When ``probabilistic=True``, uses graph-distance attenuation:
|
||||
S_transfer = S_source · ρ_historical · e^(-d_network)
|
||||
with 90-day rolling Pearson correlation for ρ_historical and shortest
|
||||
path in the competitor relationship graph for d_network (capped at 3).
|
||||
|
||||
When ``probabilistic=False``, preserves the existing flat transfer
|
||||
behavior.
|
||||
|
||||
Args:
|
||||
pool: asyncpg connection pool.
|
||||
ticker: Source company ticker that received the catalyst.
|
||||
@@ -98,9 +141,12 @@ async def propagate_signals(
|
||||
impact_score: The source document's impact score.
|
||||
document_id: The source document ID.
|
||||
config: Optional competitive config overrides.
|
||||
probabilistic: Use graph-distance attenuation when True.
|
||||
|
||||
Returns:
|
||||
List of CompetitiveSignalRecord objects produced and persisted.
|
||||
|
||||
Requirements: 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7
|
||||
"""
|
||||
cfg = config or CompetitiveConfig()
|
||||
now = datetime.now(timezone.utc)
|
||||
@@ -127,7 +173,7 @@ async def propagate_signals(
|
||||
# Determine the competitor ticker (the other side of the relationship)
|
||||
competitor_ticker = ticker_b if ticker_a == ticker else ticker_a
|
||||
|
||||
# Threshold gating (Req 4.5)
|
||||
# Threshold gating (Req 4.5 / Req 12.6)
|
||||
if rel_strength < cfg.propagation_strength_threshold:
|
||||
logger.info(
|
||||
"Skipping propagation %s→%s: relationship strength %.3f "
|
||||
@@ -161,14 +207,39 @@ async def propagate_signals(
|
||||
)
|
||||
continue
|
||||
|
||||
# Compute signal strength (Req 4.3)
|
||||
raw_strength = (
|
||||
pattern.avg_strength
|
||||
* rel_strength
|
||||
* pattern.pattern_confidence
|
||||
* impact_score
|
||||
)
|
||||
signal_strength = min(max(raw_strength, 0.0), 1.0)
|
||||
if probabilistic:
|
||||
# Graph-distance attenuation (Req 12.1–12.7)
|
||||
# For direct competitors, graph distance = 1
|
||||
graph_distance = 1
|
||||
|
||||
# Use relationship strength as a proxy for historical
|
||||
# correlation when full correlation data is unavailable.
|
||||
# Default correlation: 0.3 same-sector, 0.1 cross-sector.
|
||||
# Here we use rel_strength as a reasonable proxy since
|
||||
# the full 90-day Pearson correlation requires market data
|
||||
# that is fetched asynchronously in the integration layer.
|
||||
correlation = max(rel_strength, 0.1)
|
||||
|
||||
source_strength = (
|
||||
pattern.avg_strength
|
||||
* pattern.pattern_confidence
|
||||
* impact_score
|
||||
)
|
||||
raw_strength = compute_graph_distance_attenuation(
|
||||
source_strength=min(max(source_strength, 0.0), 1.0),
|
||||
correlation=correlation,
|
||||
distance=graph_distance,
|
||||
)
|
||||
signal_strength = min(max(raw_strength, 0.0), 1.0)
|
||||
else:
|
||||
# Flat transfer (existing behavior, Req 4.3)
|
||||
raw_strength = (
|
||||
pattern.avg_strength
|
||||
* rel_strength
|
||||
* pattern.pattern_confidence
|
||||
* impact_score
|
||||
)
|
||||
signal_strength = min(max(raw_strength, 0.0), 1.0)
|
||||
|
||||
# Determine direction
|
||||
direction = (
|
||||
|
||||
@@ -0,0 +1,164 @@
|
||||
"""Source accuracy tracker for historical prediction accuracy per source.
|
||||
|
||||
Tracks per-source accuracy metrics (fraction of correct directional calls)
|
||||
used by the probabilistic scoring pipeline to weight source credibility.
|
||||
Accuracy data is stored in the ``source_accuracy`` database table and
|
||||
fetched in batch at the start of each aggregation cycle.
|
||||
|
||||
Requirements: 4.1, 4.2, 4.3, 4.4, 4.5
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime, timezone
|
||||
|
||||
import asyncpg
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class SourceAccuracy:
|
||||
"""Per-source historical prediction accuracy.
|
||||
|
||||
Attributes:
|
||||
source_id: Unique identifier for the signal source.
|
||||
accuracy_ratio: Fraction of correct directional calls, in [0, 1].
|
||||
sample_count: Number of signals with known outcomes.
|
||||
last_updated: Timestamp of the most recent accuracy update.
|
||||
"""
|
||||
|
||||
source_id: str
|
||||
accuracy_ratio: float
|
||||
sample_count: int
|
||||
last_updated: datetime
|
||||
|
||||
@property
|
||||
def accuracy_factor(self) -> float:
|
||||
"""Multiplicative factor for credibility weight.
|
||||
|
||||
Returns 1.0 (neutral) when sample_count < 10.
|
||||
Otherwise scales linearly from 0.5 (0% accuracy) to 1.5
|
||||
(100% accuracy). Corrupted accuracy_ratio values outside
|
||||
[0, 1] are clamped before computing the factor.
|
||||
"""
|
||||
if self.sample_count < 10:
|
||||
return 1.0
|
||||
clamped = max(0.0, min(1.0, self.accuracy_ratio))
|
||||
return 0.5 + clamped
|
||||
|
||||
|
||||
async def fetch_source_accuracy(
|
||||
pool: asyncpg.Pool,
|
||||
source_ids: list[str],
|
||||
) -> dict[str, SourceAccuracy]:
|
||||
"""Fetch accuracy metrics for a batch of sources.
|
||||
|
||||
Queries the ``source_accuracy`` table for all requested *source_ids*
|
||||
in a single round-trip. Returns a mapping from source_id to its
|
||||
:class:`SourceAccuracy` record.
|
||||
|
||||
When the database is unreachable or the query fails, returns an empty
|
||||
dict so that callers fall back to the neutral accuracy factor of 1.0.
|
||||
"""
|
||||
if not source_ids:
|
||||
return {}
|
||||
|
||||
try:
|
||||
rows = await pool.fetch(
|
||||
"""
|
||||
SELECT source_id, accuracy_ratio, sample_count, last_updated
|
||||
FROM source_accuracy
|
||||
WHERE source_id = ANY($1::varchar[])
|
||||
""",
|
||||
source_ids,
|
||||
)
|
||||
except Exception:
|
||||
logger.warning(
|
||||
"Failed to fetch source accuracy; defaulting to neutral factor",
|
||||
exc_info=True,
|
||||
)
|
||||
return {}
|
||||
|
||||
result: dict[str, SourceAccuracy] = {}
|
||||
for row in rows:
|
||||
sid = row["source_id"]
|
||||
ratio = row["accuracy_ratio"]
|
||||
# Clamp corrupted accuracy_ratio to [0.0, 1.0]
|
||||
ratio = max(0.0, min(1.0, float(ratio)))
|
||||
result[sid] = SourceAccuracy(
|
||||
source_id=sid,
|
||||
accuracy_ratio=ratio,
|
||||
sample_count=int(row["sample_count"]),
|
||||
last_updated=row["last_updated"],
|
||||
)
|
||||
return result
|
||||
|
||||
|
||||
async def update_source_accuracy(
|
||||
pool: asyncpg.Pool,
|
||||
source_id: str,
|
||||
realized_outcomes: list[tuple[str, float]],
|
||||
) -> None:
|
||||
"""Update accuracy metrics for a source from realized price outcomes.
|
||||
|
||||
Each element of *realized_outcomes* is a ``(predicted_direction,
|
||||
actual_7d_return)`` pair. A prediction is considered correct when:
|
||||
|
||||
* ``predicted_direction`` is ``"bullish"`` and ``actual_7d_return > 0``
|
||||
* ``predicted_direction`` is ``"bearish"`` and ``actual_7d_return < 0``
|
||||
|
||||
Neutral predictions and zero returns are excluded from the accuracy
|
||||
calculation.
|
||||
|
||||
The function upserts the ``source_accuracy`` row, merging the new
|
||||
outcomes with any existing sample count and accuracy ratio.
|
||||
"""
|
||||
if not realized_outcomes:
|
||||
return
|
||||
|
||||
# Count correct directional calls from the new outcomes.
|
||||
correct = 0
|
||||
total = 0
|
||||
for predicted_direction, actual_return in realized_outcomes:
|
||||
direction = predicted_direction.lower()
|
||||
if direction not in ("bullish", "bearish"):
|
||||
continue
|
||||
if actual_return == 0.0:
|
||||
continue
|
||||
total += 1
|
||||
if direction == "bullish" and actual_return > 0:
|
||||
correct += 1
|
||||
elif direction == "bearish" and actual_return < 0:
|
||||
correct += 1
|
||||
|
||||
if total == 0:
|
||||
return
|
||||
|
||||
now = datetime.now(timezone.utc)
|
||||
|
||||
try:
|
||||
await pool.execute(
|
||||
"""
|
||||
INSERT INTO source_accuracy (source_id, accuracy_ratio, sample_count, last_updated)
|
||||
VALUES ($1, $2, $3, $4)
|
||||
ON CONFLICT (source_id) DO UPDATE SET
|
||||
accuracy_ratio = (
|
||||
source_accuracy.accuracy_ratio * source_accuracy.sample_count
|
||||
+ $2 * $3
|
||||
) / NULLIF(source_accuracy.sample_count + $3, 0),
|
||||
sample_count = source_accuracy.sample_count + $3,
|
||||
last_updated = $4
|
||||
""",
|
||||
source_id,
|
||||
correct / total,
|
||||
total,
|
||||
now,
|
||||
)
|
||||
except Exception:
|
||||
logger.warning(
|
||||
"Failed to update source accuracy for %s; continuing with stale data",
|
||||
source_id,
|
||||
exc_info=True,
|
||||
)
|
||||
+520
-14
@@ -19,6 +19,10 @@ from typing import Any
|
||||
|
||||
import asyncpg
|
||||
|
||||
from services.aggregation.bayesian import (
|
||||
BayesianPosterior,
|
||||
compute_bayesian_posterior,
|
||||
)
|
||||
from services.aggregation.contradiction import CatalystEntry, detect_contradictions
|
||||
from services.aggregation.evidence import (
|
||||
EvidenceRankConfig,
|
||||
@@ -28,6 +32,7 @@ from services.aggregation.evidence import (
|
||||
from services.aggregation.evidence import (
|
||||
rank_evidence as _rank_evidence_composite,
|
||||
)
|
||||
from services.aggregation.interpolation import integrate_macro_signals
|
||||
from services.aggregation.market_context import fetch_market_context
|
||||
from services.aggregation.pattern_matcher import find_self_patterns
|
||||
from services.aggregation.projection import (
|
||||
@@ -35,6 +40,11 @@ from services.aggregation.projection import (
|
||||
compute_projection,
|
||||
persist_trend_projection,
|
||||
)
|
||||
from services.aggregation.regime import (
|
||||
MarketRegime,
|
||||
RegimeClassification,
|
||||
classify_regime,
|
||||
)
|
||||
from services.aggregation.scoring import (
|
||||
ScoringConfig,
|
||||
WeightedSignal,
|
||||
@@ -46,6 +56,7 @@ from services.aggregation.signal_propagation import (
|
||||
CompetitiveSignalRecord,
|
||||
build_pattern_weighted_signals,
|
||||
)
|
||||
from services.aggregation.source_accuracy import fetch_source_accuracy
|
||||
from services.shared.metrics import (
|
||||
AGGREGATION_CONTRADICTION_SCORE,
|
||||
AGGREGATION_DURATION,
|
||||
@@ -80,6 +91,7 @@ class AggregationConfig:
|
||||
macro_enabled: bool = True # runtime toggle state
|
||||
competitive_signal_weight: float = 0.2 # relative weight of pattern signals
|
||||
competitive_enabled: bool = True # runtime toggle state
|
||||
probabilistic_scoring_enabled: bool = False # probabilistic pipeline toggle
|
||||
|
||||
def effective_windows(self) -> list[str]:
|
||||
if self.windows:
|
||||
@@ -232,6 +244,59 @@ async def fetch_competitive_enabled(pool: asyncpg.Pool) -> bool | None:
|
||||
return row["competitive_enabled"].lower() == "true"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Fetch probabilistic scoring toggle from risk_configs
|
||||
#
|
||||
# PROBABILISTIC PIPELINE TOGGLE (Requirements 16.3, 16.4, 16.5, 16.6, 16.7):
|
||||
# - Read once per aggregation cycle from the risk_configs table.
|
||||
# - When False (default): the heuristic pipeline is used — identical outputs
|
||||
# to the current system.
|
||||
# - When True: the new Bayesian, regime-aware, and adaptive formulas are
|
||||
# used for all pipeline stages.
|
||||
# - Defaults to False when the key is missing, the value is invalid, or the
|
||||
# database is unreachable (fail-safe to heuristic mode).
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_PROBABILISTIC_TOGGLE_QUERY = """
|
||||
SELECT config->>'probabilistic_scoring_enabled' AS probabilistic_scoring_enabled
|
||||
FROM risk_configs
|
||||
WHERE active = TRUE
|
||||
ORDER BY updated_at DESC
|
||||
LIMIT 1
|
||||
"""
|
||||
|
||||
|
||||
async def fetch_probabilistic_scoring_enabled(pool: asyncpg.Pool) -> bool:
|
||||
"""Check probabilistic scoring toggle from risk_configs table.
|
||||
|
||||
Returns True when explicitly enabled, False in all other cases
|
||||
(missing key, invalid value, no config row, DB error).
|
||||
This is fail-safe: any failure defaults to the heuristic pipeline.
|
||||
|
||||
Requirements: 16.3, 16.6
|
||||
"""
|
||||
try:
|
||||
row = await pool.fetchrow(_PROBABILISTIC_TOGGLE_QUERY)
|
||||
if row is None or row["probabilistic_scoring_enabled"] is None:
|
||||
return False
|
||||
raw = row["probabilistic_scoring_enabled"]
|
||||
if not isinstance(raw, str) or raw.lower() not in ("true", "false"):
|
||||
logger.warning(
|
||||
"Invalid probabilistic_scoring_enabled value %r in "
|
||||
"risk_configs; defaulting to heuristic pipeline",
|
||||
raw,
|
||||
)
|
||||
return False
|
||||
return raw.lower() == "true"
|
||||
except Exception:
|
||||
logger.warning(
|
||||
"Failed to read probabilistic_scoring_enabled from risk_configs; "
|
||||
"defaulting to heuristic pipeline",
|
||||
exc_info=True,
|
||||
)
|
||||
return False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Fetch competitive signals targeting a ticker within a time window
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -366,6 +431,9 @@ def build_macro_weighted_signals(
|
||||
window: str,
|
||||
macro_signal_weight: float = 0.3,
|
||||
config: ScoringConfig | None = None,
|
||||
*,
|
||||
returns: list[float] | None = None,
|
||||
volumes: list[float] | None = None,
|
||||
) -> list[WeightedSignal]:
|
||||
"""Convert macro impact records into WeightedSignal objects.
|
||||
|
||||
@@ -375,6 +443,9 @@ def build_macro_weighted_signals(
|
||||
- impact_score = macro_impact_score * macro_signal_weight
|
||||
- recency decay from the global event's publication time
|
||||
- confidence gating from the macro record's confidence
|
||||
|
||||
When ``config.probabilistic`` is True, passes returns/volumes for
|
||||
regime multiplier computation.
|
||||
"""
|
||||
cfg = config or ScoringConfig()
|
||||
signals: list[WeightedSignal] = []
|
||||
@@ -387,6 +458,8 @@ def build_macro_weighted_signals(
|
||||
novelty_score=0.5,
|
||||
extraction_confidence=mir.confidence,
|
||||
config=cfg,
|
||||
returns=returns,
|
||||
volumes=volumes,
|
||||
)
|
||||
sentiment = _DIRECTION_TO_SENTIMENT.get(mir.impact_direction, 0.0)
|
||||
impact = mir.macro_impact_score * macro_signal_weight
|
||||
@@ -412,11 +485,24 @@ def build_weighted_signals(
|
||||
window: str,
|
||||
market_ctx: Any | None = None,
|
||||
config: ScoringConfig | None = None,
|
||||
*,
|
||||
source_accuracy_map: dict[str, float] | None = None,
|
||||
returns: list[float] | None = None,
|
||||
volumes: list[float] | None = None,
|
||||
) -> list[WeightedSignal]:
|
||||
"""Convert impact records into WeightedSignal objects using the scoring module."""
|
||||
"""Convert impact records into WeightedSignal objects using the scoring module.
|
||||
|
||||
When ``config.probabilistic`` is True, passes source accuracy factors,
|
||||
event types, and market data (returns/volumes) to the scoring pipeline
|
||||
for regime multiplier and adaptive decay computation.
|
||||
"""
|
||||
cfg = config or ScoringConfig()
|
||||
accuracy_map = source_accuracy_map or {}
|
||||
signals: list[WeightedSignal] = []
|
||||
for imp in impacts:
|
||||
# Look up source accuracy factor for this document's source
|
||||
saf = accuracy_map.get(imp.document_id, 1.0)
|
||||
|
||||
sw = compute_signal_weight(
|
||||
published_at=imp.published_at,
|
||||
reference_time=reference_time,
|
||||
@@ -426,6 +512,11 @@ def build_weighted_signals(
|
||||
extraction_confidence=imp.confidence,
|
||||
market_ctx=market_ctx,
|
||||
config=cfg,
|
||||
event_type=imp.catalyst_type if cfg.probabilistic else None,
|
||||
impact_score=imp.impact_score,
|
||||
source_accuracy_factor=saf,
|
||||
returns=returns,
|
||||
volumes=volumes,
|
||||
)
|
||||
signals.append(
|
||||
WeightedSignal(
|
||||
@@ -433,6 +524,8 @@ def build_weighted_signals(
|
||||
weight=sw,
|
||||
sentiment_value=sentiment_to_numeric(imp.sentiment),
|
||||
impact_score=imp.impact_score,
|
||||
info_gain_factor=sw.info_gain_factor,
|
||||
source_accuracy_factor=sw.source_accuracy_factor,
|
||||
)
|
||||
)
|
||||
return signals
|
||||
@@ -649,10 +742,15 @@ def assemble_trend_summary(
|
||||
market_ctx: Any | None = None,
|
||||
max_evidence: int = MAX_EVIDENCE_REFS,
|
||||
reference_time: datetime | None = None,
|
||||
*,
|
||||
probabilistic: bool = False,
|
||||
regime: RegimeClassification | None = None,
|
||||
) -> TrendSummary:
|
||||
"""Build a complete TrendSummary from weighted signals and impact records."""
|
||||
result = assemble_trend_with_evidence(
|
||||
ticker, window, signals, impacts, market_ctx, max_evidence, reference_time,
|
||||
probabilistic=probabilistic,
|
||||
regime=regime,
|
||||
)
|
||||
return result.summary
|
||||
|
||||
@@ -665,8 +763,25 @@ def assemble_trend_with_evidence(
|
||||
market_ctx: Any | None = None,
|
||||
max_evidence: int = MAX_EVIDENCE_REFS,
|
||||
reference_time: datetime | None = None,
|
||||
*,
|
||||
probabilistic: bool = False,
|
||||
regime: RegimeClassification | None = None,
|
||||
) -> AssembledTrend:
|
||||
"""Build a TrendSummary and return detailed evidence rankings for persistence."""
|
||||
"""Build a TrendSummary and return detailed evidence rankings for persistence.
|
||||
|
||||
When ``probabilistic`` is True:
|
||||
- Computes Bayesian posterior from merged signals
|
||||
- Uses Bayesian confidence formula for trend confidence
|
||||
- Uses entropy-based direction classification
|
||||
- Applies regime-adjusted thresholds
|
||||
- Populates probabilistic TrendSummary fields
|
||||
- Stores probabilistic outputs in market_context JSONB
|
||||
|
||||
When ``probabilistic`` is False:
|
||||
- Preserves exact current heuristic behavior (no changes)
|
||||
|
||||
Requirements: 1.1, 1.2, 8.1–8.5, 9.1–9.6, 7.8, 16.4, 16.5
|
||||
"""
|
||||
if reference_time is None:
|
||||
reference_time = datetime.now(timezone.utc)
|
||||
|
||||
@@ -677,15 +792,102 @@ def assemble_trend_with_evidence(
|
||||
CatalystEntry(document_id=imp.document_id, catalyst_type=imp.catalyst_type)
|
||||
for imp in impacts
|
||||
]
|
||||
contradiction_result = detect_contradictions(signals, catalyst_entries)
|
||||
contradiction_result = detect_contradictions(
|
||||
signals, catalyst_entries, probabilistic=probabilistic,
|
||||
)
|
||||
contradiction = contradiction_result.score
|
||||
|
||||
direction = derive_trend_direction(avg_sentiment, contradiction)
|
||||
confidence = compute_trend_confidence(signals, contradiction)
|
||||
if not probabilistic:
|
||||
# --- Heuristic mode: preserve exact current behavior ---
|
||||
direction = derive_trend_direction(avg_sentiment, contradiction)
|
||||
confidence = compute_trend_confidence(signals, contradiction)
|
||||
|
||||
# Get detailed evidence rankings for persistence
|
||||
ev_config = EvidenceRankConfig(max_refs=max_evidence)
|
||||
supporting_ranked, opposing_ranked = rank_evidence_detailed(signals, ev_config)
|
||||
|
||||
supporting = list(dict.fromkeys(r.document_id for r in supporting_ranked))
|
||||
opposing = list(dict.fromkeys(r.document_id for r in opposing_ranked))
|
||||
|
||||
catalysts, risks = extract_catalysts_and_risks(impacts, signals)
|
||||
|
||||
# Trend strength: absolute value of weighted sentiment, clamped to [0, 1]
|
||||
strength = round(min(abs(avg_sentiment), 1.0), 4)
|
||||
|
||||
summary = TrendSummary(
|
||||
entity_type="company",
|
||||
entity_id=ticker,
|
||||
window=TrendWindow(window),
|
||||
trend_direction=direction,
|
||||
trend_strength=strength,
|
||||
confidence=confidence,
|
||||
top_supporting_evidence=supporting,
|
||||
top_opposing_evidence=opposing,
|
||||
dominant_catalysts=catalysts,
|
||||
material_risks=risks,
|
||||
contradiction_score=contradiction,
|
||||
disagreement_details=contradiction_result.details,
|
||||
market_context=market_ctx,
|
||||
generated_at=reference_time,
|
||||
)
|
||||
|
||||
return AssembledTrend(
|
||||
summary=summary,
|
||||
supporting_evidence=supporting_ranked,
|
||||
opposing_evidence=opposing_ranked,
|
||||
)
|
||||
|
||||
# --- Probabilistic mode (Req 8.1–8.5, 9.1–9.6) ---
|
||||
|
||||
# Default to uncertainty regime when not provided (Req 7.9)
|
||||
if regime is None:
|
||||
regime = RegimeClassification(
|
||||
regime=MarketRegime.UNCERTAINTY,
|
||||
trend_indicator=0.0,
|
||||
volatility_ratio=1.0,
|
||||
bullish_threshold=0.15,
|
||||
bearish_threshold=-0.15,
|
||||
contradiction_penalty_multiplier=0.6,
|
||||
)
|
||||
|
||||
# Compute Bayesian posterior from merged signals (Req 1.1, 1.2)
|
||||
posterior: BayesianPosterior = compute_bayesian_posterior(signals)
|
||||
|
||||
# --- Bayesian confidence formula (Req 8.1–8.4) ---
|
||||
# confidence = 0.5 × C_bayesian + 0.25 × F_count + 0.25 × C_avg_credibility - P_contradiction
|
||||
active = [s for s in signals if s.weight.combined > 0]
|
||||
unique_sources = len({s.document_id for s in active if s.document_id}) if active else 0
|
||||
f_count = min(unique_sources / 15.0, 0.8)
|
||||
|
||||
avg_credibility = (
|
||||
sum(s.weight.credibility for s in active) / len(active) if active else 0.0
|
||||
)
|
||||
|
||||
# Contradiction penalty uses regime-adjusted multiplier (Req 7.7)
|
||||
contradiction_penalty = contradiction * regime.contradiction_penalty_multiplier
|
||||
|
||||
confidence = (
|
||||
0.5 * posterior.bayesian_confidence
|
||||
+ 0.25 * f_count
|
||||
+ 0.25 * avg_credibility
|
||||
- contradiction_penalty
|
||||
)
|
||||
confidence = round(max(0.0, min(1.0, confidence)), 4)
|
||||
|
||||
# --- Entropy-based direction (Req 9.1–9.5) ---
|
||||
# Fixed P_bull thresholds for direction: 0.65 / 0.35
|
||||
if posterior.entropy > 0.9:
|
||||
direction = TrendDirection.MIXED
|
||||
elif posterior.p_bull > 0.65:
|
||||
direction = TrendDirection.BULLISH
|
||||
elif posterior.p_bull < 0.35:
|
||||
direction = TrendDirection.BEARISH
|
||||
else:
|
||||
direction = TrendDirection.NEUTRAL
|
||||
|
||||
# Get detailed evidence rankings for persistence
|
||||
config = EvidenceRankConfig(max_refs=max_evidence)
|
||||
supporting_ranked, opposing_ranked = rank_evidence_detailed(signals, config)
|
||||
ev_config = EvidenceRankConfig(max_refs=max_evidence)
|
||||
supporting_ranked, opposing_ranked = rank_evidence_detailed(signals, ev_config)
|
||||
|
||||
supporting = list(dict.fromkeys(r.document_id for r in supporting_ranked))
|
||||
opposing = list(dict.fromkeys(r.document_id for r in opposing_ranked))
|
||||
@@ -695,6 +897,30 @@ def assemble_trend_with_evidence(
|
||||
# Trend strength: absolute value of weighted sentiment, clamped to [0, 1]
|
||||
strength = round(min(abs(avg_sentiment), 1.0), 4)
|
||||
|
||||
# Build probabilistic JSONB data for market_context storage
|
||||
probabilistic_data = {
|
||||
"p_bull": round(posterior.p_bull, 6),
|
||||
"alpha": round(posterior.alpha, 4),
|
||||
"beta": round(posterior.beta, 4),
|
||||
"log_likelihood": round(posterior.log_likelihood, 6),
|
||||
"bayesian_confidence": round(posterior.bayesian_confidence, 6),
|
||||
"entropy": round(posterior.entropy, 6),
|
||||
"regime": regime.regime.value,
|
||||
"regime_volatility_ratio": round(regime.volatility_ratio, 4),
|
||||
"pipeline_mode": "probabilistic",
|
||||
"contradiction_entropy": round(contradiction, 4),
|
||||
}
|
||||
|
||||
# Enrich market_context with probabilistic outputs
|
||||
if market_ctx is not None and hasattr(market_ctx, "model_dump"):
|
||||
enriched_ctx_data = market_ctx.model_dump()
|
||||
enriched_ctx_data["probabilistic"] = probabilistic_data
|
||||
enriched_market_ctx = enriched_ctx_data
|
||||
elif isinstance(market_ctx, dict):
|
||||
enriched_market_ctx = {**market_ctx, "probabilistic": probabilistic_data}
|
||||
else:
|
||||
enriched_market_ctx = {"probabilistic": probabilistic_data}
|
||||
|
||||
summary = TrendSummary(
|
||||
entity_type="company",
|
||||
entity_id=ticker,
|
||||
@@ -708,8 +934,16 @@ def assemble_trend_with_evidence(
|
||||
material_risks=risks,
|
||||
contradiction_score=contradiction,
|
||||
disagreement_details=contradiction_result.details,
|
||||
market_context=market_ctx,
|
||||
market_context=enriched_market_ctx,
|
||||
generated_at=reference_time,
|
||||
# Probabilistic fields (Req 9.6, 16.1)
|
||||
p_bull=round(posterior.p_bull, 6),
|
||||
alpha=round(posterior.alpha, 4),
|
||||
beta_param=round(posterior.beta, 4),
|
||||
bayesian_confidence=round(posterior.bayesian_confidence, 6),
|
||||
entropy=round(posterior.entropy, 6),
|
||||
regime=regime.regime.value,
|
||||
pipeline_mode="probabilistic",
|
||||
)
|
||||
|
||||
return AssembledTrend(
|
||||
@@ -782,7 +1016,12 @@ async def persist_trend_summary(
|
||||
json.dumps(summary.material_risks),
|
||||
summary.contradiction_score,
|
||||
json.dumps([d.model_dump() for d in summary.disagreement_details]),
|
||||
json.dumps(summary.market_context.model_dump() if summary.market_context else {}, default=str),
|
||||
json.dumps(
|
||||
summary.market_context.model_dump()
|
||||
if hasattr(summary.market_context, "model_dump")
|
||||
else (summary.market_context if summary.market_context else {}),
|
||||
default=str,
|
||||
),
|
||||
summary.generated_at,
|
||||
)
|
||||
trend_id = str(row["id"])
|
||||
@@ -933,6 +1172,131 @@ async def _build_macro_event_infos(
|
||||
return infos
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Regime detection helper (Req 7.1, 7.2, 7.3, 7.8, 7.9)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_CLOSING_PRICES_QUERY = """
|
||||
SELECT close
|
||||
FROM market_data_daily
|
||||
WHERE ticker = $1
|
||||
ORDER BY bar_date DESC
|
||||
LIMIT 120
|
||||
"""
|
||||
|
||||
_DAILY_RETURNS_QUERY = """
|
||||
SELECT (close - LAG(close) OVER (ORDER BY bar_date)) / NULLIF(LAG(close) OVER (ORDER BY bar_date), 0) AS daily_return
|
||||
FROM market_data_daily
|
||||
WHERE ticker = $1
|
||||
ORDER BY bar_date DESC
|
||||
LIMIT 120
|
||||
"""
|
||||
|
||||
_DAILY_VOLUMES_QUERY = """
|
||||
SELECT volume
|
||||
FROM market_data_daily
|
||||
WHERE ticker = $1
|
||||
ORDER BY bar_date DESC
|
||||
LIMIT 30
|
||||
"""
|
||||
|
||||
# Default uncertainty regime used when market data is unavailable
|
||||
_DEFAULT_UNCERTAINTY_REGIME = RegimeClassification(
|
||||
regime=MarketRegime.UNCERTAINTY,
|
||||
trend_indicator=0.0,
|
||||
volatility_ratio=1.0,
|
||||
bullish_threshold=0.15,
|
||||
bearish_threshold=-0.15,
|
||||
contradiction_penalty_multiplier=0.6,
|
||||
)
|
||||
|
||||
|
||||
async def _classify_ticker_regime(
|
||||
pool: asyncpg.Pool,
|
||||
ticker: str,
|
||||
) -> RegimeClassification:
|
||||
"""Classify market regime for a ticker from historical price data.
|
||||
|
||||
Fetches closing prices and daily returns, then delegates to
|
||||
``classify_regime``. Falls back to the uncertainty regime when
|
||||
market data is unavailable or insufficient.
|
||||
|
||||
Requirements: 7.1, 7.2, 7.3, 7.8, 7.9
|
||||
"""
|
||||
try:
|
||||
price_rows = await pool.fetch(_CLOSING_PRICES_QUERY, ticker)
|
||||
if not price_rows:
|
||||
logger.info(
|
||||
"No market data for %s — defaulting to uncertainty regime",
|
||||
ticker,
|
||||
)
|
||||
return _DEFAULT_UNCERTAINTY_REGIME
|
||||
|
||||
# Prices come in DESC order; reverse to chronological
|
||||
closing_prices = [float(r["close"]) for r in reversed(price_rows) if r["close"] is not None]
|
||||
|
||||
return_rows = await pool.fetch(_DAILY_RETURNS_QUERY, ticker)
|
||||
# Returns come in DESC order; reverse to chronological, skip NULLs
|
||||
returns = [
|
||||
float(r["daily_return"])
|
||||
for r in reversed(return_rows)
|
||||
if r["daily_return"] is not None
|
||||
]
|
||||
|
||||
if not closing_prices or not returns:
|
||||
logger.info(
|
||||
"Insufficient market data for %s — defaulting to uncertainty regime",
|
||||
ticker,
|
||||
)
|
||||
return _DEFAULT_UNCERTAINTY_REGIME
|
||||
|
||||
return classify_regime(closing_prices, returns)
|
||||
|
||||
except Exception:
|
||||
logger.warning(
|
||||
"Failed to classify regime for %s — defaulting to uncertainty regime",
|
||||
ticker,
|
||||
exc_info=True,
|
||||
)
|
||||
return _DEFAULT_UNCERTAINTY_REGIME
|
||||
|
||||
|
||||
async def _fetch_ticker_market_data(
|
||||
pool: asyncpg.Pool,
|
||||
ticker: str,
|
||||
) -> tuple[list[float] | None, list[float] | None]:
|
||||
"""Fetch recent daily returns and volumes for regime multiplier scoring.
|
||||
|
||||
Returns (returns, volumes) where each is a chronological list or None
|
||||
if data is unavailable. Used by the probabilistic scoring pipeline
|
||||
to compute regime multiplier M_regime in ``compute_signal_weight``.
|
||||
"""
|
||||
try:
|
||||
return_rows = await pool.fetch(_DAILY_RETURNS_QUERY, ticker)
|
||||
returns = [
|
||||
float(r["daily_return"])
|
||||
for r in reversed(return_rows)
|
||||
if r["daily_return"] is not None
|
||||
] if return_rows else None
|
||||
|
||||
volume_rows = await pool.fetch(_DAILY_VOLUMES_QUERY, ticker)
|
||||
volumes = [
|
||||
float(r["volume"])
|
||||
for r in reversed(volume_rows)
|
||||
if r["volume"] is not None
|
||||
] if volume_rows else None
|
||||
|
||||
return returns or None, volumes or None
|
||||
except Exception:
|
||||
logger.warning(
|
||||
"Failed to fetch market data for %s scoring — "
|
||||
"regime multiplier will default to 1.0",
|
||||
ticker,
|
||||
exc_info=True,
|
||||
)
|
||||
return None, None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Main aggregation entry point for a single ticker + window
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -944,6 +1308,12 @@ async def aggregate_company_window(
|
||||
window: str,
|
||||
reference_time: datetime | None = None,
|
||||
config: AggregationConfig | None = None,
|
||||
*,
|
||||
probabilistic: bool = False,
|
||||
regime: RegimeClassification | None = None,
|
||||
source_accuracy_map: dict[str, float] | None = None,
|
||||
ticker_returns: list[float] | None = None,
|
||||
ticker_volumes: list[float] | None = None,
|
||||
) -> TrendSummary:
|
||||
"""Compute and persist a trend summary for one ticker and one window.
|
||||
|
||||
@@ -954,14 +1324,47 @@ async def aggregate_company_window(
|
||||
4. Build weighted signals using the scoring module.
|
||||
5. Check macro toggle and fetch/merge macro signals if enabled.
|
||||
6. Check competitive toggle and fetch/merge pattern/competitive signals if enabled.
|
||||
7. Assemble the TrendSummary.
|
||||
7. Assemble the TrendSummary (probabilistic or heuristic).
|
||||
8. Persist to trend_windows table.
|
||||
|
||||
When ``probabilistic`` is True, the scoring config is set to
|
||||
probabilistic mode, source accuracy factors are passed to signal
|
||||
scoring, and macro integration uses the conditional modifier.
|
||||
|
||||
Returns the assembled TrendSummary.
|
||||
"""
|
||||
cfg = config or AggregationConfig()
|
||||
scoring_cfg = cfg.effective_scoring()
|
||||
|
||||
# When probabilistic mode is active, create a scoring config with
|
||||
# probabilistic=True so all downstream scoring uses the new formulas.
|
||||
if probabilistic and not scoring_cfg.probabilistic:
|
||||
scoring_cfg = ScoringConfig(
|
||||
half_life_hours=scoring_cfg.half_life_hours,
|
||||
min_recency_weight=scoring_cfg.min_recency_weight,
|
||||
credibility_floor=scoring_cfg.credibility_floor,
|
||||
credibility_ceiling=scoring_cfg.credibility_ceiling,
|
||||
credibility_exponent=scoring_cfg.credibility_exponent,
|
||||
novelty_bonus_max=scoring_cfg.novelty_bonus_max,
|
||||
confidence_floor=scoring_cfg.confidence_floor,
|
||||
volatility_recency_boost_threshold=scoring_cfg.volatility_recency_boost_threshold,
|
||||
volatility_recency_boost_max=scoring_cfg.volatility_recency_boost_max,
|
||||
volume_surge_threshold_pct=scoring_cfg.volume_surge_threshold_pct,
|
||||
volume_surge_boost=scoring_cfg.volume_surge_boost,
|
||||
probabilistic=True,
|
||||
sigmoid_steepness=scoring_cfg.sigmoid_steepness,
|
||||
sigmoid_midpoint=scoring_cfg.sigmoid_midpoint,
|
||||
info_gain_lambda=scoring_cfg.info_gain_lambda,
|
||||
info_gain_max=scoring_cfg.info_gain_max,
|
||||
default_base_rate=scoring_cfg.default_base_rate,
|
||||
adaptive_decay_impact_scale=scoring_cfg.adaptive_decay_impact_scale,
|
||||
adaptive_decay_surprise_scale=scoring_cfg.adaptive_decay_surprise_scale,
|
||||
adaptive_decay_market_scale=scoring_cfg.adaptive_decay_market_scale,
|
||||
regime_return_weight=scoring_cfg.regime_return_weight,
|
||||
regime_volume_weight=scoring_cfg.regime_volume_weight,
|
||||
regime_multiplier_max=scoring_cfg.regime_multiplier_max,
|
||||
)
|
||||
|
||||
if reference_time is None:
|
||||
reference_time = datetime.now(timezone.utc)
|
||||
|
||||
@@ -975,9 +1378,13 @@ async def aggregate_company_window(
|
||||
# 2. Fetch market context
|
||||
market_ctx = await fetch_market_context(pool, ticker, window, reference_time)
|
||||
|
||||
# 3. Build weighted signals
|
||||
# 3. Build weighted signals — pass source accuracy and market data
|
||||
# when in probabilistic mode (Req 4.1–4.3, 6.1–6.5)
|
||||
signals = build_weighted_signals(
|
||||
impacts, reference_time, window, market_ctx, scoring_cfg,
|
||||
source_accuracy_map=source_accuracy_map if probabilistic else None,
|
||||
returns=ticker_returns if probabilistic else None,
|
||||
volumes=ticker_volumes if probabilistic else None,
|
||||
)
|
||||
|
||||
# 4. Check macro toggle and merge macro signals
|
||||
@@ -991,6 +1398,7 @@ async def aggregate_company_window(
|
||||
if db_toggle is not None:
|
||||
macro_enabled = db_toggle
|
||||
|
||||
macro_modifier = 1.0
|
||||
if macro_enabled:
|
||||
macro_impacts = await fetch_macro_impact_records(
|
||||
pool, ticker, window_start, reference_time,
|
||||
@@ -1002,11 +1410,31 @@ async def aggregate_company_window(
|
||||
window,
|
||||
macro_signal_weight=cfg.macro_signal_weight,
|
||||
config=scoring_cfg,
|
||||
returns=ticker_returns if probabilistic else None,
|
||||
volumes=ticker_volumes if probabilistic else None,
|
||||
)
|
||||
signals = signals + macro_signals
|
||||
|
||||
if probabilistic:
|
||||
# Probabilistic mode: use conditional macro modifier (Req 11.1–11.5)
|
||||
company_direction = derive_trend_direction(
|
||||
weighted_sentiment_average(signals),
|
||||
).value
|
||||
signals, macro_modifier = integrate_macro_signals(
|
||||
company_signals=signals,
|
||||
macro_signals=macro_signals,
|
||||
company_direction=company_direction,
|
||||
macro_impacts=macro_impacts,
|
||||
ticker=ticker,
|
||||
probabilistic=True,
|
||||
macro_signal_weight=cfg.macro_signal_weight,
|
||||
)
|
||||
else:
|
||||
# Heuristic mode: simple additive merge (current behavior)
|
||||
signals = signals + macro_signals
|
||||
|
||||
logger.info(
|
||||
"Merged %d macro signals for %s/%s",
|
||||
len(macro_signals), ticker, window,
|
||||
"Merged %d macro signals for %s/%s (modifier=%.4f)",
|
||||
len(macro_signals), ticker, window, macro_modifier,
|
||||
)
|
||||
|
||||
# 5. Check competitive toggle and merge pattern/competitive signals
|
||||
@@ -1065,9 +1493,17 @@ async def aggregate_company_window(
|
||||
market_ctx=market_ctx if market_ctx.has_data else None,
|
||||
max_evidence=cfg.max_evidence,
|
||||
reference_time=reference_time,
|
||||
probabilistic=probabilistic,
|
||||
regime=regime,
|
||||
)
|
||||
summary = assembled.summary
|
||||
|
||||
# 6b. Enrich probabilistic JSONB with macro modifier (Req 16.2)
|
||||
if probabilistic and macro_modifier != 1.0:
|
||||
ctx = summary.market_context
|
||||
if isinstance(ctx, dict) and "probabilistic" in ctx:
|
||||
ctx["probabilistic"]["macro_modifier"] = round(macro_modifier, 4)
|
||||
|
||||
# 7. Persist trend window
|
||||
trend_id = await persist_trend_summary(pool, summary)
|
||||
|
||||
@@ -1136,10 +1572,80 @@ async def aggregate_company(
|
||||
if reference_time is None:
|
||||
reference_time = datetime.now(timezone.utc)
|
||||
|
||||
# Read probabilistic scoring flag once per cycle (Requirement 16.7).
|
||||
# Mid-cycle changes take effect on the next cycle.
|
||||
probabilistic = await fetch_probabilistic_scoring_enabled(pool)
|
||||
pipeline_mode = "probabilistic" if probabilistic else "heuristic"
|
||||
logger.info(
|
||||
"Aggregation cycle for %s: pipeline_mode=%s",
|
||||
ticker,
|
||||
pipeline_mode,
|
||||
)
|
||||
|
||||
# --- Regime detection (Req 7.1, 7.2, 7.3, 7.8, 7.9) ---
|
||||
# Classify market regime for this ticker using closing prices and returns.
|
||||
# Default to uncertainty regime when market data is unavailable.
|
||||
regime: RegimeClassification | None = None
|
||||
ticker_returns: list[float] | None = None
|
||||
ticker_volumes: list[float] | None = None
|
||||
source_accuracy_map: dict[str, float] | None = None
|
||||
|
||||
if probabilistic:
|
||||
regime = await _classify_ticker_regime(pool, ticker)
|
||||
logger.info(
|
||||
"Regime for %s: %s (trend_indicator=%.1f, vol_ratio=%.2f, "
|
||||
"bullish_threshold=%.2f, contradiction_mult=%.1f)",
|
||||
ticker,
|
||||
regime.regime.value,
|
||||
regime.trend_indicator,
|
||||
regime.volatility_ratio,
|
||||
regime.bullish_threshold,
|
||||
regime.contradiction_penalty_multiplier,
|
||||
)
|
||||
|
||||
# Fetch market data (returns/volumes) for regime multiplier in scoring
|
||||
# (Req 6.1–6.5). Fetched once per cycle and reused across all windows.
|
||||
ticker_returns, ticker_volumes = await _fetch_ticker_market_data(pool, ticker)
|
||||
|
||||
# Batch-fetch source accuracy for all sources in the signal set
|
||||
# (Req 4.1–4.3). Fetched once per cycle; individual signals look up
|
||||
# their factor from this map. DB errors default to empty map (factor 1.0).
|
||||
try:
|
||||
# Fetch all source IDs from the longest window to cover all signals
|
||||
longest_window = max(
|
||||
cfg.effective_windows(),
|
||||
key=lambda w: WINDOW_DURATIONS.get(w, timedelta(days=7)),
|
||||
)
|
||||
longest_duration = WINDOW_DURATIONS.get(longest_window, timedelta(days=90))
|
||||
window_start = reference_time - longest_duration
|
||||
all_impacts = await fetch_impact_records(pool, ticker, window_start, reference_time)
|
||||
source_ids = list({imp.document_id for imp in all_impacts})
|
||||
if source_ids:
|
||||
sa_records = await fetch_source_accuracy(pool, source_ids)
|
||||
source_accuracy_map = {
|
||||
sid: sa.accuracy_factor for sid, sa in sa_records.items()
|
||||
}
|
||||
logger.info(
|
||||
"Fetched source accuracy for %s: %d/%d sources have records",
|
||||
ticker, len(sa_records), len(source_ids),
|
||||
)
|
||||
except Exception:
|
||||
logger.warning(
|
||||
"Failed to fetch source accuracy for %s — defaulting to neutral factor",
|
||||
ticker,
|
||||
exc_info=True,
|
||||
)
|
||||
source_accuracy_map = None
|
||||
|
||||
summaries: list[TrendSummary] = []
|
||||
for window in cfg.effective_windows():
|
||||
summary = await aggregate_company_window(
|
||||
pool, ticker, window, reference_time, cfg,
|
||||
probabilistic=probabilistic,
|
||||
regime=regime,
|
||||
source_accuracy_map=source_accuracy_map,
|
||||
ticker_returns=ticker_returns,
|
||||
ticker_volumes=ticker_volumes,
|
||||
)
|
||||
summaries.append(summary)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user