feat: wire live decision loop and enable paper trading

Phase 2 of the autonomous trading engine: - Replace start()/stop() stubs with real async implementations - Decision loop: polls recommendations from PostgreSQL, deduplicates via Redis, evaluates through the full pipeline, submits orders to stonks:queue:broker_orders - Stop-loss monitor: fetches prices from Polygon API, checks crossings, submits immediate sell orders, safety sell after 15 min without data - Performance loop: computes metrics every 5 min during market hours, persists daily snapshots at market close - Risk tier scheduler: evaluates daily at 16:00 ET, persists tier changes - Rebalance scheduler: evaluates Monday 09:45 ET, respects circuit breaker - Notification dispatch: SNS + Gmail with rate limiting and retry - Backtest replay: fetches historical data, simulates decisions, persists - Real asyncpg/redis connections in FastAPI lifespan (graceful degradation) - Migration 019: enable paper trading with conservative tier, 5 cap - Added max_open_positions to TradingConfig with env var loading - Phase 2 tasks added to autonomous-trading-engine spec
2026-04-15 20:52:28 +00:00
parent c4b90a5224
commit 70bad7709a
8 changed files with 2159 additions and 28 deletions
@@ -654,3 +654,309 @@ This plan implements a fully autonomous trading engine as a new service (`servic
 - Migration number 018 is the next available migration slot
 - Frontend components use the existing React 19 + TypeScript + Tailwind + TanStack + Recharts stack
 - Dashboard proxy needs `/trading/` → `trading-engine:8000` added to nginx.conf
+
+
+## Phase 2: Live Wiring and Paper Trading
+
+### Overview
+
+Phase 1 (Tasks 1–26) implemented all pure computation modules, property tests, FastAPI endpoints, Helm chart, and frontend panels. Phase 2 replaces the lifecycle stubs in `services/trading/engine.py` with real async implementations, wires all sub-components into live loops backed by PostgreSQL and Redis, adds notification dispatch and backtest replay, connects real database pools in the FastAPI lifespan, enables paper trading configuration, and adds integration tests for the live wiring.
+
+### Tasks
+
+- [ ] 27. Wire the live decision loop in `services/trading/engine.py`
+  - [ ] 27.1 Replace `start()` stub with real async implementation
+    - Load `trading_engine_config` from PostgreSQL via `self.pool`
+    - Load active risk tier parameters from `risk_tier_history` (latest entry) or fall back to config default
+    - Sync portfolio state from Broker Service: fetch positions and account balance via `asyncpg` query against the broker's `orders` / `positions` tables
+    - Load reserve pool balance from `reserve_pool_ledger` (latest `balance_after`)
+    - Load circuit breaker status from `circuit_breaker_events` (unresolved events)
+    - Load open stop-loss/take-profit levels from `position_stop_levels` where `active = TRUE`
+    - Populate `self.portfolio_state` with loaded data
+    - Create `asyncio.Task` instances for `_decision_loop()`, `_stop_loss_monitor()`, `_performance_loop()`, `_risk_tier_scheduler()`, `_rebalance_scheduler()` and store in `self._tasks: list[asyncio.Task]`
+    - Set `self.running = True` only after successful state load
+    - If portfolio state cannot be loaded, enter degraded state (readiness probe unhealthy), retry every 30 seconds
+    - _Requirements: 1.6, 18.5_
+
+  - [ ] 27.2 Replace `stop()` stub with real async shutdown
+    - Set `self.running = False`
+    - Cancel all tasks in `self._tasks` and `await asyncio.gather(*self._tasks, return_exceptions=True)`
+    - Persist current portfolio state snapshot to `portfolio_snapshots`
+    - Close any pending gradual entry tranches
+    - Log shutdown event
+    - _Requirements: 1.6, 16.4_
+
+  - [ ] 27.3 Implement `_decision_loop()` coroutine
+    - `while self.running`: sleep for `self.config.polling_interval_seconds`, then poll recommendations
+    - Poll recommendations from PostgreSQL: `SELECT * FROM recommendations WHERE action IN ('buy','sell') AND mode IN ('paper_eligible','live_eligible') AND generated_at > $1 ORDER BY confidence DESC`
+    - For each recommendation, check Redis deduplication key `stonks:dedupe:trading:{recommendation_id}` (24h TTL) — skip if already set
+    - Set the Redis dedupe key immediately before evaluation to prevent double-processing on restart
+    - Call `self.evaluate_recommendation()` (existing synchronous method) with current portfolio state, risk tier, circuit breaker state, correlation matrix, and earnings calendar
+    - For "act" decisions: generate order job payload matching existing broker queue schema, push to `stonks:queue:broker_orders` via Redis RPUSH; handle gradual entry for large positions
+    - Call `_persist_decision()` for every decision (act or skip)
+    - Update `self.portfolio_state` after each acted decision (reduce active pool, increment open position count)
+    - Wrap each recommendation evaluation in try/except — on failure, persist skip decision with reason `evaluation_error` and continue
+    - _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5_
+
+  - [ ] 27.4 Implement `_sync_positions_and_siphon()` helper
+    - Fetch current positions and account balance from PostgreSQL (broker tables)
+    - Detect newly closed positions by comparing with previous `self.portfolio_state.positions`
+    - For each profitable close: call `self.reserve_pool_controller.siphon_profit()` with realized profit and current reserve balance
+    - Persist siphon event to `reserve_pool_ledger` via `self.pool`
+    - Update `self.portfolio_state` with refreshed positions, active pool, reserve pool
+    - Trigger notification for large trade P&L events (> 5% of Active Pool)
+    - _Requirements: 3.1, 3.2, 3.3, 19.2_
+
+  - [ ] 27.5 Implement `_persist_decision()` helper
+    - INSERT into `trading_decisions` table with all fields from the `TradingDecision` dataclass
+    - Use `self.pool.execute()` with parameterized query
+    - Log decision summary (ticker, decision, skip_reason if any)
+    - _Requirements: 1.4, 17.1_
+
+  - [ ] 27.6 Implement asyncio task management
+    - Add `self._tasks: list[asyncio.Task] = []` to `__init__()`
+    - In `start()`, create named tasks: `asyncio.create_task(self._decision_loop(), name="decision_loop")`, etc.
+    - In `stop()`, cancel all tasks, await with `return_exceptions=True`, clear the list
+    - Add error handling: if a task raises an unexpected exception, log it and restart the task (unless `self.running` is False)
+    - _Requirements: 1.1, 1.6_
+
+- [ ] 28. Wire the stop-loss monitoring loop
+  - [ ] 28.1 Implement `_stop_loss_monitor()` coroutine in `services/trading/engine.py`
+    - `while self.running`: sleep for `self.config.stop_loss_check_interval_seconds` (default 300s, or `fast_stop_loss_interval_seconds` = 60s during high-severity events)
+    - Call `_load_open_positions()` and `_load_stop_levels()` from PostgreSQL
+    - Call `_fetch_current_prices()` for all tickers with open positions
+    - Call `self.check_stop_loss_crossings(positions, prices, stop_levels)` (existing method delegates to StopLossManager)
+    - For each `StopTrigger` returned: generate immediate market sell order, push to `stonks:queue:broker_orders` via Redis
+    - Persist stop-loss trigger event to `trading_decisions` with decision="act" and trace noting stop-loss/take-profit trigger
+    - _Requirements: 4.3, 4.4, 4.5, 7.4_
+
+  - [ ] 28.2 Implement `_fetch_current_prices()` helper
+    - Query the market data adapter (Polygon API) for current/latest prices of given tickers
+    - Use `services/shared/config.py` `MarketDataConfig` for API key and base URL
+    - Return `dict[str, float]` mapping ticker → latest price
+    - On API failure: log warning, return empty dict for failed tickers
+    - _Requirements: 4.3, 4.8_
+
+  - [ ] 28.3 Implement `_load_open_positions()` and `_load_stop_levels()` helpers
+    - `_load_open_positions()`: query broker/positions tables via `self.pool` to get current open positions, return as `list[OpenPosition]`
+    - `_load_stop_levels()`: query `position_stop_levels WHERE active = TRUE` via `self.pool`, return as `dict[str, StopLevels]` keyed by ticker
+    - _Requirements: 4.3, 18.3_
+
+  - [ ] 28.4 Implement safety sell for missing price data
+    - Track last successful price fetch timestamp per ticker
+    - If a ticker has no price data for > 15 minutes during market hours (checked via `is_market_open()`), generate a market sell order for that position
+    - Log warning with ticker and duration of missing data
+    - _Requirements: 4.8_
+
+- [ ] 29. Wire the performance metrics loop
+  - [ ] 29.1 Implement `_performance_loop()` coroutine in `services/trading/engine.py`
+    - `while self.running`: sleep 300 seconds (5 minutes)
+    - Check if currently within market hours via `is_market_open()`; skip computation if outside market hours
+    - Call `self.performance_tracker.compute_metrics()` with current portfolio state from `self.pool`
+    - Update `self.portfolio_state` with latest metrics (portfolio heat, unrealized P&L, etc.)
+    - _Requirements: 14.1_
+
+  - [ ] 29.2 Implement daily snapshot persistence
+    - At end of trading day (after 4:00 PM ET), call `self.performance_tracker.persist_daily_snapshot()` to write to `portfolio_snapshots` table via `self.pool`
+    - Include end-of-day portfolio value, daily return, cumulative return, all positions with unrealized P&L, and computed metrics
+    - _Requirements: 14.3_
+
+  - [ ] 29.3 Wire performance tracker to use real database pool
+    - Pass `self.pool` to `PerformanceTracker` so it can query closed trades from `trading_decisions` and broker fill tables
+    - Compute Sharpe ratio from `portfolio_snapshots` trailing 30-day daily returns
+    - Compute win/loss counts and profit factor from closed trade records
+    - _Requirements: 14.1, 14.2_
+
+- [ ] 30. Wire risk tier and rebalance schedulers
+  - [ ] 30.1 Implement `_risk_tier_scheduler()` coroutine in `services/trading/engine.py`
+    - `while self.running`: compute seconds until next 16:00 ET, sleep until then
+    - Load latest `PerformanceMetrics` from `portfolio_snapshots` or compute fresh
+    - Compute `reserve_pct = self.portfolio_state.reserve_pool / self.portfolio_state.total_value`
+    - Call `self.evaluate_risk_tier(current_tier, metrics, reserve_pct)` (existing method delegates to RiskTierController)
+    - If tier changed: persist to `risk_tier_history` via `self.pool`, update `self.config.risk_tier`, trigger notification via `self.create_alert("risk_tier_changed", ...)`
+    - _Requirements: 5.2, 5.5, 19.2_
+
+  - [ ] 30.2 Implement `_rebalance_scheduler()` coroutine in `services/trading/engine.py`
+    - `while self.running`: compute seconds until next Monday 09:45 ET, sleep until then
+    - Load current positions and active risk tier
+    - Call `self.evaluate_rebalancing(positions, risk_tier, active_pool)` (existing method delegates to PortfolioRebalancer)
+    - For each rebalance order returned: generate order job with `rebalance` tag in decision trace, push to `stonks:queue:broker_orders`
+    - Persist rebalance decisions to `trading_decisions` table
+    - Respect circuit breaker status — skip rebalancing if any circuit breaker is active
+    - _Requirements: 8.1, 8.5, 8.6_
+
+- [ ] 31. Wire notification dispatch
+  - [ ] 31.1 Create `services/trading/notification_dispatch.py` with `NotificationDispatcher` class
+    - Accept `pool`, `redis`, and `TradingConfig` in constructor
+    - Implement `dispatch(event_type: str, message: str)` method that routes to enabled channels
+    - Check `self.config.sns_topic_arn` / `self.config.gmail_recipient` to determine enabled channels
+    - Call `_send_sns()` and/or `_send_gmail()` based on enabled channels
+    - Persist notification record to `notifications` table via `self.pool` with channel, event_type, message, delivery_status, timestamp
+    - _Requirements: 19.1, 19.8_
+
+  - [ ] 31.2 Implement SNS delivery via `boto3`
+    - Implement `_send_sns(event_type: str, message: str)` method
+    - Use `boto3.client("sns")` with credentials from environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`)
+    - Publish to configured `sns_topic_arn` with message and subject based on event_type
+    - Return delivery status (delivered/failed)
+    - _Requirements: 19.5_
+
+  - [ ] 31.3 Implement Gmail delivery via `google-api-python-client`
+    - Implement `_send_gmail(event_type: str, message: str)` method
+    - Use `google.oauth2.credentials.Credentials` with refresh token from environment variables
+    - Build Gmail API service, create MIME message, send via `users().messages().send()`
+    - Configurable sender (`self.config.gmail_sender`) and recipient (`self.config.gmail_recipient`)
+    - Return delivery status (delivered/failed)
+    - _Requirements: 19.6_
+
+  - [ ] 31.4 Implement rate limiting via Redis
+    - Before each send, check Redis counter `stonks:trading:notification_rate:{channel}` with 1-hour TTL
+    - If counter >= limit (10 SMS/hour, 20 emails/hour), mark notification as `rate_limited` and skip delivery
+    - Increment counter on successful delivery
+    - _Requirements: 19.7_
+
+  - [ ] 31.5 Implement retry with exponential backoff
+    - On delivery failure, retry up to 3 times with delays: 1s, 2s, 4s
+    - Update notification record with retry_count and error_message on final failure
+    - Never block trading operations — run dispatch in a separate `asyncio.create_task()`
+    - _Requirements: 19.11_
+
+  - [ ] 31.6 Implement daily summary at 16:30 ET
+    - Add `_daily_summary_scheduler()` coroutine: sleep until 16:30 ET each trading day
+    - Compute daily metrics from `portfolio_snapshots` and current portfolio state
+    - Format summary message with: daily P&L, total portfolio value, Active/Reserve Pool balances, trade count, current Risk Tier, circuit breaker status
+    - Dispatch via `self.dispatch("daily_summary", summary_message)`
+    - _Requirements: 19.3_
+
+- [ ] 32. Wire backtest replay
+  - [ ] 32.1 Create `services/trading/backtest_replay.py` with `BacktestReplay` class
+    - Accept `pool: asyncpg.Pool` in constructor
+    - Implement `run(config: BacktestConfig) -> BacktestResult` method
+    - _Requirements: 15.1_
+
+  - [ ] 32.2 Fetch historical recommendations and price data
+    - Query `recommendations` table for date range: `WHERE generated_at BETWEEN $1 AND $2 AND action IN ('buy','sell') ORDER BY generated_at ASC`
+    - Query market data tables for historical daily close prices within the date range
+    - Build a day-by-day timeline of recommendations and prices
+    - Handle missing data gracefully: skip dates with no price data, note gaps in result
+    - _Requirements: 15.1, 15.2_
+
+  - [ ] 32.3 Simulate full decision logic chronologically
+    - Initialize simulated portfolio state with `config.initial_capital` and configured risk tier
+    - For each trading day in the date range, process recommendations through `evaluate_recommendation()` using historical prices
+    - Simulate stop-loss/take-profit crossings using historical intraday or daily price data
+    - Simulate reserve pool siphoning on profitable closes
+    - Simulate circuit breaker triggers based on simulated daily P&L
+    - Simulate risk tier auto-adjustment at daily close
+    - Simulate weekly rebalancing on Mondays
+    - Track equity curve: `[{date, portfolio_value}]` for each trading day
+    - _Requirements: 15.2_
+
+  - [ ] 32.4 Persist results to `backtest_runs` and `backtest_trades`
+    - INSERT into `backtest_runs` with config, result metrics (total_return, sharpe_ratio, max_drawdown, win_rate, profit_factor, trade_count), equity_curve JSONB, status='completed'
+    - INSERT into `backtest_trades` for each simulated trade with ticker, side, entry/exit prices, quantity, pnl, dates, hold_duration, recommendation_id
+    - On mid-run error: persist partial results with status='failed' and error message
+    - _Requirements: 15.4_
+
+  - [ ] 32.5 Wire into `POST /api/trading/backtest` endpoint in `services/trading/app.py`
+    - Replace placeholder `launch_backtest()` with real implementation
+    - Instantiate `BacktestReplay(pool=engine.pool)` and call `run()` in a background `asyncio.Task`
+    - Return `backtest_id` immediately
+    - Update `GET /api/trading/backtest/{id}` to query `backtest_runs` and `backtest_trades` from PostgreSQL
+    - _Requirements: 15.5_
+
+- [ ] 33. Wire real connections in `services/trading/app.py` lifespan
+  - [ ] 33.1 Replace `pool=None` with `asyncpg.create_pool()`
+    - In the lifespan `async with` block, create pool: `pool = await asyncpg.create_pool(dsn=config.postgres.dsn, min_size=2, max_size=10)`
+    - Pass `pool` to `TradingEngine(pool=pool, ...)`
+    - In lifespan exit: `await pool.close()`
+    - _Requirements: 1.6, 18.5_
+
+  - [ ] 33.2 Replace `redis=None` with `aioredis.from_url()`
+    - In the lifespan block, create Redis client: `redis_client = aioredis.from_url(config.redis.url)`
+    - Pass `redis_client` to `TradingEngine(pool=pool, redis=redis_client, ...)`
+    - In lifespan exit: `await redis_client.close()`
+    - _Requirements: 1.5_
+
+  - [ ] 33.3 Add proper error handling and cleanup in lifespan
+    - Wrap pool/redis creation in try/except — log critical error and raise if connections fail
+    - Ensure `engine.stop()`, `pool.close()`, and `redis_client.close()` are called in the `finally` block of lifespan exit
+    - Log connection details (host, port, database) at startup for debugging (not passwords)
+    - _Requirements: 1.6, 18.5_
+
+- [ ] 34. Checkpoint — Ensure all live wiring compiles and existing tests still pass
+  - Ensure all tests pass, ask the user if questions arise.
+
+- [ ] 35. Enable paper trading configuration
+  - [ ] 35.1 Update `trading_engine_config` defaults for paper trading
+    - Add a SQL migration or seed script that updates the default `trading_engine_config` row: `enabled=true`, `risk_tier='conservative'`, `absolute_position_cap=25.0` (conservative for initial paper trading)
+    - Set `polling_interval_seconds=60`, `max_open_positions=5` (conservative start)
+    - _Requirements: 16.1, 5.1_
+
+  - [ ] 35.2 Add `TRADING_ENABLED=true` to Helm values for trading-engine deployment
+    - Update `infra/helm/stonks-oracle/values.yaml` to set `TRADING_ENABLED: "true"` in the trading-engine environment variables
+    - Ensure `TRADING_RISK_TIER: "conservative"` and `TRADING_ABSOLUTE_POSITION_CAP: "25.0"` are set
+    - _Requirements: 16.1_
+
+  - [ ] 35.3 Verify trading-engine pod starts and readiness probe passes
+    - After deployment, confirm the trading-engine pod reaches `Running` state
+    - Confirm `GET /ready` returns `{"ready": true}` once portfolio state is loaded
+    - Confirm `GET /health` returns `{"status": "ok"}`
+    - Confirm `GET /api/trading/status` returns the expected configuration (enabled=true, risk_tier=conservative)
+    - _Requirements: 1.7, 16.2_
+
+- [ ] 36. Write integration tests for live wiring
+  - [ ]* 36.1 Test decision loop with mocked PostgreSQL and Redis
+    - Create `tests/test_trading_integration.py`
+    - Mock `asyncpg.Pool` to return canned recommendation rows and portfolio state
+    - Mock Redis client for deduplication checks and broker queue pushes
+    - Verify the decision loop polls recommendations, evaluates them, persists decisions, and pushes "act" orders to the broker queue
+    - Verify deduplication prevents double-processing
+    - Verify skip decisions are persisted with correct reasons
+    - _Requirements: 1.1, 1.2, 1.3, 1.4, 1.5_
+
+  - [ ]* 36.2 Test stop-loss monitor with mocked price API
+    - Mock `_fetch_current_prices()` to return prices that cross stop-loss and take-profit levels
+    - Verify sell orders are generated and pushed to broker queue for triggered positions
+    - Verify no orders generated when prices are between stop and take-profit
+    - Test safety sell: mock price fetch returning empty for > 15 minutes, verify position closed
+    - _Requirements: 4.4, 4.5, 4.8_
+
+  - [ ]* 36.3 Test notification dispatch with mocked SNS and Gmail
+    - Mock `boto3.client("sns")` and Gmail API service
+    - Verify SNS publish called with correct topic ARN and message for SMS-enabled events
+    - Verify Gmail send called with correct sender/recipient for email-enabled events
+    - Verify rate limiting: send 11 SMS in one hour, verify 11th is marked `rate_limited`
+    - Verify retry: mock first delivery failure, verify retry with backoff, verify final success
+    - _Requirements: 19.1, 19.5, 19.6, 19.7, 19.11_
+
+  - [ ]* 36.4 Test backtest replay end-to-end
+    - Mock PostgreSQL pool to return historical recommendations and price data
+    - Run `BacktestReplay.run()` with a small date range and $500 initial capital
+    - Verify backtest result contains expected metrics (total_return, sharpe_ratio, max_drawdown, win_rate, trade_count)
+    - Verify equity curve has one entry per trading day
+    - Verify trades are persisted to `backtest_trades` (mocked INSERT calls)
+    - _Requirements: 15.1, 15.2, 15.3, 15.4_
+
+  - [ ]* 36.5 Test lifespan creates real pool and redis connections
+    - Use `httpx.AsyncClient` with the FastAPI `app` and mock `asyncpg.create_pool` / `aioredis.from_url`
+    - Verify pool and redis are created during startup and closed during shutdown
+    - Verify engine receives non-None pool and redis
+    - _Requirements: 1.6, 18.5_
+
+- [ ] 37. Final checkpoint — Verify paper trading is operational
+  - Ensure all tests pass, ask the user if questions arise.
+  - Trading engine pod is running and ready
+  - Decision loop is polling recommendations from PostgreSQL
+  - Stop-loss monitor is checking prices at configured interval
+  - Performance metrics are being computed every 5 minutes during market hours
+  - Dashboard shows trading engine status as enabled with conservative tier
+
+## Phase 2 Notes
+
+- Phase 2 tasks build on the completed Phase 1 pure computation modules — no Phase 1 code is rewritten, only the lifecycle stubs are replaced
+- All async loops use `while self.running` pattern with `asyncio.sleep()` for clean shutdown
+- Database connections are created in the FastAPI lifespan and passed to the engine — no global connection state
+- Integration tests use mocked database pools and Redis clients to avoid requiring live infrastructure
+- Paper trading starts with conservative settings (risk_tier=conservative, absolute_position_cap=$25, max_open_positions=5) to validate behavior before scaling up
+- Tasks 36.x (integration tests) are marked optional (`*`) — they can be skipped for faster deployment but are recommended