feat: comprehensive docs, unit tests, docker-compose app services
- Add scheduler and ingestion unit tests (test_scheduler_unit.py, test_ingestion_unit.py) - Add all 13 app services + dashboard to docker-compose.yml - Add full documentation suite: API reference, Helm reference, Docker deployment guide, 3 architecture diagrams (K8s, Docker Compose, data pipeline), AI agent guide, backup/restore guide, observability/metrics reference, per-service docs - Add intelligence pipeline deep-dive docs with Mermaid diagrams - Update README with documentation index and links - Add specs for comprehensive-quality-docs, intelligence-pipeline-deep-dive, sanitized-pipeline-docs
This commit is contained in:
@@ -0,0 +1,440 @@
|
||||
# Backup and Restore Guide
|
||||
|
||||
This guide documents every backup and restore script in the Stonks Oracle platform, their CLI options, storage locations, retention policies, and procedures for disaster recovery.
|
||||
|
||||
## Overview
|
||||
|
||||
Stonks Oracle provides two tiers of backup tooling:
|
||||
|
||||
| Tier | Scripts | Scope | Storage |
|
||||
|------|---------|-------|---------|
|
||||
| **Local (kubectl-based)** | `backup-db.sh`, `restore-db.sh`, `backup-redis.sh` | Individual data stores, streamed to the operator's machine | `~/backups/stonks-oracle/` (local filesystem) |
|
||||
| **Cluster (Kubernetes Job)** | `backup.sh`, `restore.sh` | Full platform (PostgreSQL + all MinIO buckets) | NFS share at `192.168.42.8:/volume1/Kubernetes/stonks` |
|
||||
|
||||
All scripts live in the `scripts/` directory and require `kubectl` access to the cluster.
|
||||
|
||||
---
|
||||
|
||||
## Local Backup Scripts
|
||||
|
||||
### `backup-db.sh` — PostgreSQL Database Backup
|
||||
|
||||
Creates a compressed `pg_dump` of the `stonks` database and optionally uploads it to MinIO.
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
./scripts/backup-db.sh # backup to local file
|
||||
./scripts/backup-db.sh --upload-minio # backup + upload to MinIO
|
||||
```
|
||||
|
||||
**CLI Arguments:**
|
||||
|
||||
| Argument | Required | Description |
|
||||
|----------|----------|-------------|
|
||||
| `--upload-minio` | No | Upload the backup file to the `stonks-backups` MinIO bucket after creating it |
|
||||
|
||||
**Environment Variables:**
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `BACKUP_DIR` | `~/backups/stonks-oracle` | Local directory where backup files are stored |
|
||||
|
||||
**What it captures:**
|
||||
|
||||
- Full `pg_dump` of the `stonks` database (all tables, data, sequences)
|
||||
- Dump flags: `--no-owner --no-privileges --clean --if-exists`
|
||||
- Output format: gzip-compressed SQL (`.sql.gz`)
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. Runs `pg_dump` inside the PostgreSQL pod (`postgresql-1` in `postgresql-service` namespace) and streams the compressed output to the local machine
|
||||
2. Validates the backup is non-empty and counts tables as a sanity check
|
||||
3. If `--upload-minio` is specified, attempts to create the `stonks-backups` bucket (if it doesn't exist) and stages the file for upload
|
||||
4. Prunes old backups, keeping only the last 7 files matching `stonks-*.sql.gz`
|
||||
|
||||
**Storage:**
|
||||
|
||||
- Local path: `~/backups/stonks-oracle/stonks-<YYYYMMDD-HHMMSS>.sql.gz`
|
||||
- MinIO bucket (optional): `stonks-backups`
|
||||
|
||||
**Retention:** Keeps the last 7 backups. Older files matching `stonks-*.sql.gz` in the backup directory are automatically deleted.
|
||||
|
||||
---
|
||||
|
||||
### `backup-redis.sh` — Redis State Backup
|
||||
|
||||
Triggers a Redis `BGSAVE` and copies the RDB dump file to the local machine.
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
./scripts/backup-redis.sh
|
||||
```
|
||||
|
||||
**CLI Arguments:** None.
|
||||
|
||||
**Environment Variables:**
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `BACKUP_DIR` | `~/backups/stonks-oracle` | Local directory where the RDB file is stored |
|
||||
| `REDIS_PASSWORD` | `PSCh4ng3me!` | Redis authentication password |
|
||||
|
||||
**What it captures:**
|
||||
|
||||
- Redis RDB snapshot (`dump.rdb`) containing all in-memory state: deduplication markers, queue contents, rate-limit counters, cached values
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. Triggers `BGSAVE` on the Redis master pod (`redis-master-0` in `redis-service` namespace)
|
||||
2. Waits 5 seconds for the background save to complete, then logs the `LASTSAVE` timestamp
|
||||
3. Copies the RDB file from the pod. Tries `/data/dump.rdb` first, then falls back to `/var/lib/redis/dump.rdb` and `/bitnami/redis/data/dump.rdb`
|
||||
4. Prints Redis keyspace statistics for verification
|
||||
|
||||
**Storage:**
|
||||
|
||||
- Local path: `~/backups/stonks-oracle/redis-<YYYYMMDD-HHMMSS>.rdb`
|
||||
|
||||
**Retention:** No automatic pruning. Old Redis backups accumulate and must be cleaned up manually.
|
||||
|
||||
---
|
||||
|
||||
### `restore-db.sh` — PostgreSQL Database Restore
|
||||
|
||||
Restores a `pg_dump` backup into the `stonks` database with full service scale-down/scale-up.
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
./scripts/restore-db.sh <backup-file.sql.gz>
|
||||
./scripts/restore-db.sh ~/backups/stonks-oracle/stonks-20260415-180000.sql.gz
|
||||
```
|
||||
|
||||
If called without arguments, lists available backups in `~/backups/stonks-oracle/`.
|
||||
|
||||
**CLI Arguments:**
|
||||
|
||||
| Argument | Required | Description |
|
||||
|----------|----------|-------------|
|
||||
| `<backup-file.sql.gz>` | Yes | Path to the gzip-compressed SQL backup file to restore |
|
||||
|
||||
**What it restores:**
|
||||
|
||||
- All tables, data, sequences, and indexes in the `stonks` database
|
||||
- Re-grants `ALL PRIVILEGES` to the `stonks` user on all tables and sequences after restore
|
||||
|
||||
**Service scale-down/scale-up procedure:**
|
||||
|
||||
1. **Terminates active connections** — Runs `pg_terminate_backend()` for all connections to the `stonks` database
|
||||
2. **Scales down all deployments** in the `stonks-oracle` namespace to 0 replicas to prevent reconnections
|
||||
3. **Waits 10 seconds** for pods to terminate
|
||||
4. **Restores the backup** using `psql --single-transaction` (piped from `zcat`)
|
||||
5. **Re-grants permissions** to the `stonks` user
|
||||
6. **Verifies** the restore by counting tables
|
||||
7. **Scales all deployments back to 1 replica**, then scales `ingestion` and `parser` to 2 replicas
|
||||
|
||||
**Data loss implications:**
|
||||
|
||||
> **WARNING:** This replaces ALL data in the `stonks` database with the backup contents. Any data written after the backup was taken is permanently lost. The script requires interactive confirmation — you must type `yes` to proceed.
|
||||
|
||||
---
|
||||
|
||||
## Cluster Backup Scripts (Kubernetes Jobs)
|
||||
|
||||
### `backup.sh` — Full Platform Backup (PostgreSQL + MinIO)
|
||||
|
||||
Runs a Kubernetes Job that backs up both PostgreSQL and all MinIO buckets to an NFS share.
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
bash scripts/backup.sh
|
||||
```
|
||||
|
||||
**CLI Arguments:** None.
|
||||
|
||||
**What it captures:**
|
||||
|
||||
- **PostgreSQL**: Full `pg_dump` in custom format (`-Fc`) as `stonks.pgdump`
|
||||
- **MinIO buckets** (8 buckets mirrored):
|
||||
- `stonks-raw-market` — Raw market data from Polygon.io
|
||||
- `stonks-raw-news` — Raw news articles
|
||||
- `stonks-raw-filings` — Raw SEC filings
|
||||
- `stonks-normalized` — Normalized documents
|
||||
- `stonks-llm-prompts` — LLM prompt logs
|
||||
- `stonks-llm-results` — LLM extraction results
|
||||
- `stonks-lakehouse` — Parquet fact tables for Trino
|
||||
- `stonks-audit` — Audit trail artifacts
|
||||
- **Manifest**: `manifest.json` with backup name, timestamp, and bucket list
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. Deletes any previous `stonks-backup` Job
|
||||
2. Creates a Kubernetes Job using `postgres:18-alpine` with NFS volume mount and MinIO credentials from cluster secrets
|
||||
3. Inside the Job container:
|
||||
- Runs `pg_dump` with credentials from `stonks-config` ConfigMap and `stonks-core-secrets` Secret
|
||||
- Installs the MinIO client (`mc`) and mirrors each bucket to the NFS backup directory
|
||||
- Writes a `manifest.json` and updates the `latest` symlink
|
||||
4. Waits up to 600 seconds (10 minutes) for the Job to complete
|
||||
5. Job auto-cleans after 300 seconds (`ttlSecondsAfterFinished`)
|
||||
|
||||
**Storage:**
|
||||
|
||||
- NFS path: `192.168.42.8:/volume1/Kubernetes/stonks/<backup-name>/`
|
||||
- Directory structure:
|
||||
```
|
||||
stonks-backup-YYYYMMDD-HHMMSS/
|
||||
├── stonks.pgdump # PostgreSQL custom-format dump
|
||||
├── manifest.json # Backup metadata
|
||||
└── minio/
|
||||
├── stonks-raw-market/ # Mirrored bucket contents
|
||||
├── stonks-raw-news/
|
||||
├── stonks-raw-filings/
|
||||
├── stonks-normalized/
|
||||
├── stonks-llm-prompts/
|
||||
├── stonks-llm-results/
|
||||
├── stonks-lakehouse/
|
||||
└── stonks-audit/
|
||||
```
|
||||
- A `latest` symlink always points to the most recent backup
|
||||
|
||||
**Retention:** No automatic pruning on NFS. Old backups must be cleaned up manually.
|
||||
|
||||
---
|
||||
|
||||
### `restore.sh` — Full Platform Restore (PostgreSQL + MinIO)
|
||||
|
||||
Runs a Kubernetes Job that restores both PostgreSQL and MinIO buckets from an NFS backup.
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
bash scripts/restore.sh # restore from "latest" symlink
|
||||
bash scripts/restore.sh <backup-name> # restore a specific backup
|
||||
```
|
||||
|
||||
**CLI Arguments:**
|
||||
|
||||
| Argument | Required | Description |
|
||||
|----------|----------|-------------|
|
||||
| `<backup-name>` | No | Name of the backup directory on NFS. Defaults to `latest` (symlink to most recent backup) |
|
||||
|
||||
**What it restores:**
|
||||
|
||||
- **PostgreSQL**: Full database restore using `pg_restore --clean --if-exists --no-owner --no-acl`
|
||||
- **MinIO buckets**: All 8 buckets mirrored back with `mc mirror --overwrite`
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. Prints a warning and gives 5 seconds to abort (Ctrl+C)
|
||||
2. Deletes any previous `stonks-restore` Job
|
||||
3. Creates a Kubernetes Job that:
|
||||
- Validates the backup exists (`stonks.pgdump` file present)
|
||||
- Restores PostgreSQL using `pg_restore` with `--clean` (drops and recreates objects)
|
||||
- Installs `mc` and mirrors each bucket back from NFS to MinIO
|
||||
- Verifies the restore by querying row counts for key tables (companies, documents, intelligence, impacts, trends, recommendations)
|
||||
4. Waits up to 600 seconds for the Job to complete
|
||||
|
||||
**Data loss implications:**
|
||||
|
||||
> **WARNING:** This will DROP and recreate all objects in the `stonks` database. All MinIO bucket contents are overwritten. Any data written after the backup was taken is permanently lost. The script provides a 5-second abort window before proceeding.
|
||||
|
||||
**Post-restore steps:**
|
||||
|
||||
After the restore completes, restart all services to pick up the restored state:
|
||||
|
||||
```bash
|
||||
kubectl rollout restart deployment -n stonks-oracle --all
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## MinIO Upload Option (`--upload-minio`)
|
||||
|
||||
The `backup-db.sh` script supports `--upload-minio` for off-host storage of database backups. When enabled:
|
||||
|
||||
1. The script connects to MinIO through an ingestion pod in the `stonks-oracle` namespace
|
||||
2. Creates the `stonks-backups` bucket if it doesn't already exist
|
||||
3. Stages the backup file for upload
|
||||
|
||||
This provides a second copy of the database backup on object storage, separate from the operator's local filesystem. The full cluster backup (`backup.sh`) stores backups on NFS and does not use this flag — it backs up MinIO bucket *contents* rather than uploading database dumps *to* MinIO.
|
||||
|
||||
---
|
||||
|
||||
## Full Nuke and Rebuild Procedure
|
||||
|
||||
When a complete platform reset is needed (corrupted state, major schema changes, fresh start), follow this procedure:
|
||||
|
||||
### Step 1: Tear Down Services
|
||||
|
||||
```bash
|
||||
bash ~/sources/kube/stonks-oracle/runmelast.sh
|
||||
```
|
||||
|
||||
This runs from `gremlin-1` and performs a Helm uninstall, cleaning up all Kubernetes resources in the `stonks-oracle` namespace. Database, MinIO, and Redis data are preserved (they run in separate namespaces).
|
||||
|
||||
### Step 2: Terminate Database Connections
|
||||
|
||||
```bash
|
||||
kubectl exec -n postgresql-service postgresql-1 -c postgres -- \
|
||||
psql -U postgres -c \
|
||||
"SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'stonks' AND pid <> pg_backend_pid();"
|
||||
```
|
||||
|
||||
### Step 3: Drop the Database
|
||||
|
||||
```bash
|
||||
kubectl exec -n postgresql-service postgresql-1 -c postgres -- \
|
||||
psql -U postgres -c "DROP DATABASE IF EXISTS stonks;"
|
||||
```
|
||||
|
||||
### Step 4: Flush Redis
|
||||
|
||||
Clear all `stonks:*` keys to reset deduplication markers, queue contents, and cached state:
|
||||
|
||||
```bash
|
||||
kubectl exec -n redis-service redis-master-0 -- \
|
||||
redis-cli -a 'PSCh4ng3me!' --scan --pattern 'stonks:*' | \
|
||||
xargs -L 100 kubectl exec -n redis-service redis-master-0 -- \
|
||||
redis-cli -a 'PSCh4ng3me!' DEL
|
||||
```
|
||||
|
||||
### Step 5: Redeploy
|
||||
|
||||
```bash
|
||||
bash ~/sources/kube/stonks-oracle/runmefirst.sh
|
||||
```
|
||||
|
||||
This runs from `gremlin-1` and performs:
|
||||
- Database creation and migration (all `infra/migrations/*.sql` files applied in order)
|
||||
- Helm install with secrets injected via `--set` flags
|
||||
- Rolling restart of all deployments
|
||||
|
||||
### Step 6: Re-seed the Symbol Registry
|
||||
|
||||
```bash
|
||||
POSTGRES_HOST=postgresql-rw.postgresql-service.svc.cluster.local \
|
||||
POSTGRES_PASSWORD='St0nks0racl3!' \
|
||||
POSTGRES_USER=stonks \
|
||||
POSTGRES_DB=stonks \
|
||||
.venv/bin/python -m services.symbol_registry.seed
|
||||
```
|
||||
|
||||
This populates the 50 tracked companies across 10 sectors and 46 competitor relationships.
|
||||
|
||||
---
|
||||
|
||||
## Recommended Backup Schedules
|
||||
|
||||
### Daily Database Backup (cron)
|
||||
|
||||
Run `backup-db.sh` daily on a machine with `kubectl` access. The built-in retention keeps the last 7 backups automatically.
|
||||
|
||||
```cron
|
||||
# Daily database backup at 2:00 AM
|
||||
0 2 * * * /path/to/stonks-oracle/scripts/backup-db.sh --upload-minio >> /var/log/stonks-backup.log 2>&1
|
||||
```
|
||||
|
||||
### Weekly Full Backup (cron)
|
||||
|
||||
Run the full cluster backup weekly to capture both PostgreSQL and MinIO data on NFS:
|
||||
|
||||
```cron
|
||||
# Weekly full backup (PostgreSQL + MinIO) on Sundays at 3:00 AM
|
||||
0 3 * * 0 /path/to/stonks-oracle/scripts/backup.sh >> /var/log/stonks-full-backup.log 2>&1
|
||||
```
|
||||
|
||||
### Redis Backup Before Deployments
|
||||
|
||||
Redis state is transient (queues, dedup markers, caches) and rebuilds naturally. Back up Redis before major deployments or database resets as a precaution:
|
||||
|
||||
```bash
|
||||
./scripts/backup-redis.sh
|
||||
```
|
||||
|
||||
### Kubernetes CronJobs
|
||||
|
||||
For fully automated in-cluster backups, create a CronJob based on the same Job spec used by `backup.sh`:
|
||||
|
||||
```yaml
|
||||
apiVersion: batch/v1
|
||||
kind: CronJob
|
||||
metadata:
|
||||
name: stonks-backup
|
||||
namespace: stonks-oracle
|
||||
spec:
|
||||
schedule: "0 2 * * *" # Daily at 2:00 AM UTC
|
||||
concurrencyPolicy: Forbid
|
||||
successfulJobsHistoryLimit: 3
|
||||
failedJobsHistoryLimit: 3
|
||||
jobTemplate:
|
||||
spec:
|
||||
ttlSecondsAfterFinished: 3600
|
||||
backoffLimit: 1
|
||||
template:
|
||||
spec:
|
||||
restartPolicy: Never
|
||||
volumes:
|
||||
- name: nfs-backup
|
||||
nfs:
|
||||
server: 192.168.42.8
|
||||
path: /volume1/Kubernetes/stonks
|
||||
containers:
|
||||
- name: backup
|
||||
image: postgres:18-alpine
|
||||
volumeMounts:
|
||||
- name: nfs-backup
|
||||
mountPath: /backup
|
||||
envFrom:
|
||||
- configMapRef:
|
||||
name: stonks-config
|
||||
- secretRef:
|
||||
name: stonks-core-secrets
|
||||
env:
|
||||
- name: MINIO_ACCESS_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: stonks-core-secrets
|
||||
key: MINIO_ACCESS_KEY
|
||||
- name: MINIO_SECRET_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: stonks-core-secrets
|
||||
key: MINIO_SECRET_KEY
|
||||
command: ["sh", "-c"]
|
||||
args:
|
||||
- |
|
||||
set -e
|
||||
apk add --no-cache curl ca-certificates
|
||||
STAMP="stonks-backup-$(date +%Y%m%d-%H%M%S)"
|
||||
DIR="/backup/${STAMP}"
|
||||
mkdir -p "${DIR}/minio"
|
||||
|
||||
# PostgreSQL backup
|
||||
PGPASSWORD="${POSTGRES_PASSWORD}" pg_dump \
|
||||
-h "${POSTGRES_HOST}" -p "${POSTGRES_PORT}" \
|
||||
-U "${POSTGRES_USER}" -d "${POSTGRES_DB}" \
|
||||
--no-owner --no-acl -Fc \
|
||||
-f "${DIR}/stonks.pgdump"
|
||||
|
||||
# MinIO backup
|
||||
curl -sL https://dl.min.io/client/mc/release/linux-amd64/mc -o /usr/local/bin/mc
|
||||
chmod +x /usr/local/bin/mc
|
||||
mc alias set backup "http://${MINIO_ENDPOINT}" "${MINIO_ACCESS_KEY}" "${MINIO_SECRET_KEY}" --api S3v4
|
||||
|
||||
for bucket in stonks-raw-market stonks-raw-news stonks-raw-filings stonks-normalized stonks-llm-prompts stonks-llm-results stonks-lakehouse stonks-audit; do
|
||||
mc mirror "backup/${bucket}" "${DIR}/minio/${bucket}/" 2>/dev/null || true
|
||||
done
|
||||
|
||||
ln -sfn "${STAMP}" /backup/latest
|
||||
echo "Backup complete: ${DIR}"
|
||||
```
|
||||
|
||||
### Recommended Schedule Summary
|
||||
|
||||
| What | Frequency | Script | Retention |
|
||||
|------|-----------|--------|-----------|
|
||||
| Database only | Daily | `backup-db.sh --upload-minio` | Last 7 (auto-pruned) |
|
||||
| Full platform (DB + MinIO) | Weekly | `backup.sh` | Manual cleanup on NFS |
|
||||
| Redis snapshot | Before deployments | `backup-redis.sh` | Manual cleanup |
|
||||
Reference in New Issue
Block a user