feat: comprehensive docs, unit tests, docker-compose app services

- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py)
- Add all 13 app services + dashboard to docker-compose.yml
- Add full documentation suite: API reference, Helm reference, Docker deployment guide,
  3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide,
  backup/restore guide, observability/metrics reference, per-service docs
- Add intelligence pipeline deep-dive docs with Mermaid diagrams
- Update README with documentation index and links
- Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive,
  sanitized-pipeline-docs
This commit is contained in:
Celes Renata
2026-04-22 02:56:41 +00:00
parent f251c53f92
commit 88ad1e8d99
57 changed files with 13318 additions and 51 deletions
+440
View File
@@ -0,0 +1,440 @@
# Backup and Restore Guide
This guide documents every backup and restore script in the Stonks Oracle platform, their CLI options, storage locations, retention policies, and procedures for disaster recovery.
## Overview
Stonks Oracle provides two tiers of backup tooling:
| Tier | Scripts | Scope | Storage |
|------|---------|-------|---------|
| **Local (kubectl-based)** | `backup-db.sh`, `restore-db.sh`, `backup-redis.sh` | Individual data stores, streamed to the operator's machine | `~/backups/stonks-oracle/` (local filesystem) |
| **Cluster (Kubernetes Job)** | `backup.sh`, `restore.sh` | Full platform (PostgreSQL + all MinIO buckets) | NFS share at `192.168.42.8:/volume1/Kubernetes/stonks` |
All scripts live in the `scripts/` directory and require `kubectl` access to the cluster.
---
## Local Backup Scripts
### `backup-db.sh` — PostgreSQL Database Backup
Creates a compressed `pg_dump` of the `stonks` database and optionally uploads it to MinIO.
**Usage:**
```bash
./scripts/backup-db.sh # backup to local file
./scripts/backup-db.sh --upload-minio # backup + upload to MinIO
```
**CLI Arguments:**
| Argument | Required | Description |
|----------|----------|-------------|
| `--upload-minio` | No | Upload the backup file to the `stonks-backups` MinIO bucket after creating it |
**Environment Variables:**
| Variable | Default | Description |
|----------|---------|-------------|
| `BACKUP_DIR` | `~/backups/stonks-oracle` | Local directory where backup files are stored |
**What it captures:**
- Full `pg_dump` of the `stonks` database (all tables, data, sequences)
- Dump flags: `--no-owner --no-privileges --clean --if-exists`
- Output format: gzip-compressed SQL (`.sql.gz`)
**How it works:**
1. Runs `pg_dump` inside the PostgreSQL pod (`postgresql-1` in `postgresql-service` namespace) and streams the compressed output to the local machine
2. Validates the backup is non-empty and counts tables as a sanity check
3. If `--upload-minio` is specified, attempts to create the `stonks-backups` bucket (if it doesn't exist) and stages the file for upload
4. Prunes old backups, keeping only the last 7 files matching `stonks-*.sql.gz`
**Storage:**
- Local path: `~/backups/stonks-oracle/stonks-<YYYYMMDD-HHMMSS>.sql.gz`
- MinIO bucket (optional): `stonks-backups`
**Retention:** Keeps the last 7 backups. Older files matching `stonks-*.sql.gz` in the backup directory are automatically deleted.
---
### `backup-redis.sh` — Redis State Backup
Triggers a Redis `BGSAVE` and copies the RDB dump file to the local machine.
**Usage:**
```bash
./scripts/backup-redis.sh
```
**CLI Arguments:** None.
**Environment Variables:**
| Variable | Default | Description |
|----------|---------|-------------|
| `BACKUP_DIR` | `~/backups/stonks-oracle` | Local directory where the RDB file is stored |
| `REDIS_PASSWORD` | `PSCh4ng3me!` | Redis authentication password |
**What it captures:**
- Redis RDB snapshot (`dump.rdb`) containing all in-memory state: deduplication markers, queue contents, rate-limit counters, cached values
**How it works:**
1. Triggers `BGSAVE` on the Redis master pod (`redis-master-0` in `redis-service` namespace)
2. Waits 5 seconds for the background save to complete, then logs the `LASTSAVE` timestamp
3. Copies the RDB file from the pod. Tries `/data/dump.rdb` first, then falls back to `/var/lib/redis/dump.rdb` and `/bitnami/redis/data/dump.rdb`
4. Prints Redis keyspace statistics for verification
**Storage:**
- Local path: `~/backups/stonks-oracle/redis-<YYYYMMDD-HHMMSS>.rdb`
**Retention:** No automatic pruning. Old Redis backups accumulate and must be cleaned up manually.
---
### `restore-db.sh` — PostgreSQL Database Restore
Restores a `pg_dump` backup into the `stonks` database with full service scale-down/scale-up.
**Usage:**
```bash
./scripts/restore-db.sh <backup-file.sql.gz>
./scripts/restore-db.sh ~/backups/stonks-oracle/stonks-20260415-180000.sql.gz
```
If called without arguments, lists available backups in `~/backups/stonks-oracle/`.
**CLI Arguments:**
| Argument | Required | Description |
|----------|----------|-------------|
| `<backup-file.sql.gz>` | Yes | Path to the gzip-compressed SQL backup file to restore |
**What it restores:**
- All tables, data, sequences, and indexes in the `stonks` database
- Re-grants `ALL PRIVILEGES` to the `stonks` user on all tables and sequences after restore
**Service scale-down/scale-up procedure:**
1. **Terminates active connections** — Runs `pg_terminate_backend()` for all connections to the `stonks` database
2. **Scales down all deployments** in the `stonks-oracle` namespace to 0 replicas to prevent reconnections
3. **Waits 10 seconds** for pods to terminate
4. **Restores the backup** using `psql --single-transaction` (piped from `zcat`)
5. **Re-grants permissions** to the `stonks` user
6. **Verifies** the restore by counting tables
7. **Scales all deployments back to 1 replica**, then scales `ingestion` and `parser` to 2 replicas
**Data loss implications:**
> **WARNING:** This replaces ALL data in the `stonks` database with the backup contents. Any data written after the backup was taken is permanently lost. The script requires interactive confirmation — you must type `yes` to proceed.
---
## Cluster Backup Scripts (Kubernetes Jobs)
### `backup.sh` — Full Platform Backup (PostgreSQL + MinIO)
Runs a Kubernetes Job that backs up both PostgreSQL and all MinIO buckets to an NFS share.
**Usage:**
```bash
bash scripts/backup.sh
```
**CLI Arguments:** None.
**What it captures:**
- **PostgreSQL**: Full `pg_dump` in custom format (`-Fc`) as `stonks.pgdump`
- **MinIO buckets** (8 buckets mirrored):
- `stonks-raw-market` — Raw market data from Polygon.io
- `stonks-raw-news` — Raw news articles
- `stonks-raw-filings` — Raw SEC filings
- `stonks-normalized` — Normalized documents
- `stonks-llm-prompts` — LLM prompt logs
- `stonks-llm-results` — LLM extraction results
- `stonks-lakehouse` — Parquet fact tables for Trino
- `stonks-audit` — Audit trail artifacts
- **Manifest**: `manifest.json` with backup name, timestamp, and bucket list
**How it works:**
1. Deletes any previous `stonks-backup` Job
2. Creates a Kubernetes Job using `postgres:18-alpine` with NFS volume mount and MinIO credentials from cluster secrets
3. Inside the Job container:
- Runs `pg_dump` with credentials from `stonks-config` ConfigMap and `stonks-core-secrets` Secret
- Installs the MinIO client (`mc`) and mirrors each bucket to the NFS backup directory
- Writes a `manifest.json` and updates the `latest` symlink
4. Waits up to 600 seconds (10 minutes) for the Job to complete
5. Job auto-cleans after 300 seconds (`ttlSecondsAfterFinished`)
**Storage:**
- NFS path: `192.168.42.8:/volume1/Kubernetes/stonks/<backup-name>/`
- Directory structure:
```
stonks-backup-YYYYMMDD-HHMMSS/
├── stonks.pgdump # PostgreSQL custom-format dump
├── manifest.json # Backup metadata
└── minio/
├── stonks-raw-market/ # Mirrored bucket contents
├── stonks-raw-news/
├── stonks-raw-filings/
├── stonks-normalized/
├── stonks-llm-prompts/
├── stonks-llm-results/
├── stonks-lakehouse/
└── stonks-audit/
```
- A `latest` symlink always points to the most recent backup
**Retention:** No automatic pruning on NFS. Old backups must be cleaned up manually.
---
### `restore.sh` — Full Platform Restore (PostgreSQL + MinIO)
Runs a Kubernetes Job that restores both PostgreSQL and MinIO buckets from an NFS backup.
**Usage:**
```bash
bash scripts/restore.sh # restore from "latest" symlink
bash scripts/restore.sh <backup-name> # restore a specific backup
```
**CLI Arguments:**
| Argument | Required | Description |
|----------|----------|-------------|
| `<backup-name>` | No | Name of the backup directory on NFS. Defaults to `latest` (symlink to most recent backup) |
**What it restores:**
- **PostgreSQL**: Full database restore using `pg_restore --clean --if-exists --no-owner --no-acl`
- **MinIO buckets**: All 8 buckets mirrored back with `mc mirror --overwrite`
**How it works:**
1. Prints a warning and gives 5 seconds to abort (Ctrl+C)
2. Deletes any previous `stonks-restore` Job
3. Creates a Kubernetes Job that:
- Validates the backup exists (`stonks.pgdump` file present)
- Restores PostgreSQL using `pg_restore` with `--clean` (drops and recreates objects)
- Installs `mc` and mirrors each bucket back from NFS to MinIO
- Verifies the restore by querying row counts for key tables (companies, documents, intelligence, impacts, trends, recommendations)
4. Waits up to 600 seconds for the Job to complete
**Data loss implications:**
> **WARNING:** This will DROP and recreate all objects in the `stonks` database. All MinIO bucket contents are overwritten. Any data written after the backup was taken is permanently lost. The script provides a 5-second abort window before proceeding.
**Post-restore steps:**
After the restore completes, restart all services to pick up the restored state:
```bash
kubectl rollout restart deployment -n stonks-oracle --all
```
---
## MinIO Upload Option (`--upload-minio`)
The `backup-db.sh` script supports `--upload-minio` for off-host storage of database backups. When enabled:
1. The script connects to MinIO through an ingestion pod in the `stonks-oracle` namespace
2. Creates the `stonks-backups` bucket if it doesn't already exist
3. Stages the backup file for upload
This provides a second copy of the database backup on object storage, separate from the operator's local filesystem. The full cluster backup (`backup.sh`) stores backups on NFS and does not use this flag — it backs up MinIO bucket *contents* rather than uploading database dumps *to* MinIO.
---
## Full Nuke and Rebuild Procedure
When a complete platform reset is needed (corrupted state, major schema changes, fresh start), follow this procedure:
### Step 1: Tear Down Services
```bash
bash ~/sources/kube/stonks-oracle/runmelast.sh
```
This runs from `gremlin-1` and performs a Helm uninstall, cleaning up all Kubernetes resources in the `stonks-oracle` namespace. Database, MinIO, and Redis data are preserved (they run in separate namespaces).
### Step 2: Terminate Database Connections
```bash
kubectl exec -n postgresql-service postgresql-1 -c postgres -- \
psql -U postgres -c \
"SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'stonks' AND pid <> pg_backend_pid();"
```
### Step 3: Drop the Database
```bash
kubectl exec -n postgresql-service postgresql-1 -c postgres -- \
psql -U postgres -c "DROP DATABASE IF EXISTS stonks;"
```
### Step 4: Flush Redis
Clear all `stonks:*` keys to reset deduplication markers, queue contents, and cached state:
```bash
kubectl exec -n redis-service redis-master-0 -- \
redis-cli -a 'PSCh4ng3me!' --scan --pattern 'stonks:*' | \
xargs -L 100 kubectl exec -n redis-service redis-master-0 -- \
redis-cli -a 'PSCh4ng3me!' DEL
```
### Step 5: Redeploy
```bash
bash ~/sources/kube/stonks-oracle/runmefirst.sh
```
This runs from `gremlin-1` and performs:
- Database creation and migration (all `infra/migrations/*.sql` files applied in order)
- Helm install with secrets injected via `--set` flags
- Rolling restart of all deployments
### Step 6: Re-seed the Symbol Registry
```bash
POSTGRES_HOST=postgresql-rw.postgresql-service.svc.cluster.local \
POSTGRES_PASSWORD='St0nks0racl3!' \
POSTGRES_USER=stonks \
POSTGRES_DB=stonks \
.venv/bin/python -m services.symbol_registry.seed
```
This populates the 50 tracked companies across 10 sectors and 46 competitor relationships.
---
## Recommended Backup Schedules
### Daily Database Backup (cron)
Run `backup-db.sh` daily on a machine with `kubectl` access. The built-in retention keeps the last 7 backups automatically.
```cron
# Daily database backup at 2:00 AM
0 2 * * * /path/to/stonks-oracle/scripts/backup-db.sh --upload-minio >> /var/log/stonks-backup.log 2>&1
```
### Weekly Full Backup (cron)
Run the full cluster backup weekly to capture both PostgreSQL and MinIO data on NFS:
```cron
# Weekly full backup (PostgreSQL + MinIO) on Sundays at 3:00 AM
0 3 * * 0 /path/to/stonks-oracle/scripts/backup.sh >> /var/log/stonks-full-backup.log 2>&1
```
### Redis Backup Before Deployments
Redis state is transient (queues, dedup markers, caches) and rebuilds naturally. Back up Redis before major deployments or database resets as a precaution:
```bash
./scripts/backup-redis.sh
```
### Kubernetes CronJobs
For fully automated in-cluster backups, create a CronJob based on the same Job spec used by `backup.sh`:
```yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: stonks-backup
namespace: stonks-oracle
spec:
schedule: "0 2 * * *" # Daily at 2:00 AM UTC
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
ttlSecondsAfterFinished: 3600
backoffLimit: 1
template:
spec:
restartPolicy: Never
volumes:
- name: nfs-backup
nfs:
server: 192.168.42.8
path: /volume1/Kubernetes/stonks
containers:
- name: backup
image: postgres:18-alpine
volumeMounts:
- name: nfs-backup
mountPath: /backup
envFrom:
- configMapRef:
name: stonks-config
- secretRef:
name: stonks-core-secrets
env:
- name: MINIO_ACCESS_KEY
valueFrom:
secretKeyRef:
name: stonks-core-secrets
key: MINIO_ACCESS_KEY
- name: MINIO_SECRET_KEY
valueFrom:
secretKeyRef:
name: stonks-core-secrets
key: MINIO_SECRET_KEY
command: ["sh", "-c"]
args:
- |
set -e
apk add --no-cache curl ca-certificates
STAMP="stonks-backup-$(date +%Y%m%d-%H%M%S)"
DIR="/backup/${STAMP}"
mkdir -p "${DIR}/minio"
# PostgreSQL backup
PGPASSWORD="${POSTGRES_PASSWORD}" pg_dump \
-h "${POSTGRES_HOST}" -p "${POSTGRES_PORT}" \
-U "${POSTGRES_USER}" -d "${POSTGRES_DB}" \
--no-owner --no-acl -Fc \
-f "${DIR}/stonks.pgdump"
# MinIO backup
curl -sL https://dl.min.io/client/mc/release/linux-amd64/mc -o /usr/local/bin/mc
chmod +x /usr/local/bin/mc
mc alias set backup "http://${MINIO_ENDPOINT}" "${MINIO_ACCESS_KEY}" "${MINIO_SECRET_KEY}" --api S3v4
for bucket in stonks-raw-market stonks-raw-news stonks-raw-filings stonks-normalized stonks-llm-prompts stonks-llm-results stonks-lakehouse stonks-audit; do
mc mirror "backup/${bucket}" "${DIR}/minio/${bucket}/" 2>/dev/null || true
done
ln -sfn "${STAMP}" /backup/latest
echo "Backup complete: ${DIR}"
```
### Recommended Schedule Summary
| What | Frequency | Script | Retention |
|------|-----------|--------|-----------|
| Database only | Daily | `backup-db.sh --upload-minio` | Last 7 (auto-pruned) |
| Full platform (DB + MinIO) | Weekly | `backup.sh` | Manual cleanup on NFS |
| Redis snapshot | Before deployments | `backup-redis.sh` | Manual cleanup |