c85c0068a2
- Replace all datetime.utcnow() with datetime.now(tz=timezone.utc) across 8 files - Fix 12 failing tests to match current implementation behavior - Fix pytest_plugins in non-top-level conftest (moved to root conftest.py) - Auto-fix 189 lint issues (import sorting, unused imports) - Add CI/CD pipeline infrastructure (ARC, ArgoCD, Kargo manifests) - Add values-beta.yaml and values-paper.yaml for staged deployments - Update GitHub Actions workflow to use self-hosted-gremlin runners - Add integration-test job to CI pipeline Result: 1596 passed, 0 failed, 0 warnings
97 lines
10 KiB
Markdown
97 lines
10 KiB
Markdown
# Implementation Plan: CI/CD Pipeline
|
||
|
||
## Overview
|
||
|
||
Build a full CI/CD pipeline for Stonks Oracle using ARC (self-hosted GitHub Actions runners), ArgoCD (GitOps deployment), and Kargo (staged promotion orchestration) on the Gremlin cluster. Pipeline infrastructure scripts go in `~/sources/kube/pipelines/` on gremlin-1. Helm values files and the updated GitHub Actions workflow go in the stonks-oracle repo.
|
||
|
||
## Tasks
|
||
|
||
- [x] 1. Create NFS PersistentVolume manifests
|
||
- [x] 1.1 Create `~/sources/kube/pipelines/pvs/argocd-pv.yaml` — NFS PV for ArgoCD (5Gi, `nfs://192.168.42.8:/volume1/Kubernetes/pipelines/argocd`, `persistentVolumeReclaimPolicy: Retain`, label `app: pipeline-argocd`)
|
||
- _Requirements: 1.2, 17.1, 17.4_
|
||
- [x] 1.2 Create `~/sources/kube/pipelines/pvs/kargo-pv.yaml` — NFS PV for Kargo (2Gi, `nfs://192.168.42.8:/volume1/Kubernetes/pipelines/kargo`, `persistentVolumeReclaimPolicy: Retain`, label `app: pipeline-kargo`)
|
||
- _Requirements: 1.2, 17.1, 17.4_
|
||
- [x] 1.3 Create `~/sources/kube/pipelines/pvs/arc-pv.yaml` — NFS PV for ARC (2Gi, `nfs://192.168.42.8:/volume1/Kubernetes/pipelines/arc`, `persistentVolumeReclaimPolicy: Retain`, label `app: pipeline-arc`)
|
||
- _Requirements: 1.2, 17.1, 17.4_
|
||
|
||
- [x] 2. Create ARC (Actions Runner Controller) manifests
|
||
- [x] 2.1 Create `~/sources/kube/pipelines/arc/values.yaml` — Helm values for the ARC controller chart (`oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller`), namespace `arc-system`
|
||
- _Requirements: 1.1, 4.1_
|
||
- [x] 2.2 Create `~/sources/kube/pipelines/arc/runner-scaleset.yaml` — RunnerScaleSet CR for `celesrenata/stonks-oracle` repo with label `self-hosted-gremlin`, `containerMode.type: kubernetes`, ephemeral pods, 2 CPU / 4Gi memory limits
|
||
- _Requirements: 4.1, 4.2, 4.3, 4.4_
|
||
|
||
- [x] 3. Create ArgoCD manifests
|
||
- [x] 3.1 Create `~/sources/kube/pipelines/argocd/values.yaml` — Helm values for `argo/argo-cd` chart in `argocd` namespace, with Traefik ingress at `stonks-argocd.celestium.life`, TLS via `ca-issuer`, NFS PVC for persistence
|
||
- _Requirements: 1.1, 1.3, 18.1_
|
||
- [x] 3.2 Create `~/sources/kube/pipelines/argocd/repo-secret.yaml` — Kubernetes Secret with Git credentials for the `celesrenata/stonks-oracle` repository, namespace `argocd`
|
||
- _Requirements: 18.1_
|
||
- [x] 3.3 Create `~/sources/kube/pipelines/argocd/apps/stonks-beta.yaml` — ArgoCD Application for beta stage, pointing at `infra/helm/stonks-oracle/` with `values-beta.yaml`, target namespace `stonks-beta`, auto-sync with prune and selfHeal
|
||
- _Requirements: 8.1, 8.4, 18.2, 18.3_
|
||
- [x] 3.4 Create `~/sources/kube/pipelines/argocd/apps/stonks-paper.yaml` — ArgoCD Application for paper stage, pointing at `infra/helm/stonks-oracle/` with `values-paper.yaml`, target namespace `stonks-paper`, auto-sync with prune and selfHeal
|
||
- _Requirements: 9.1, 9.5, 18.2, 18.3_
|
||
- [x] 3.5 Create `~/sources/kube/pipelines/argocd/apps/stonks-live.yaml` — ArgoCD Application for live stage, pointing at `infra/helm/stonks-oracle/` with `values.yaml`, target namespace `stonks-oracle`, auto-sync with prune and selfHeal
|
||
- _Requirements: 10.2, 10.5, 18.2, 18.3_
|
||
|
||
- [x] 4. Checkpoint — Verify ArgoCD and ARC manifests
|
||
- Ensure all YAML manifests are syntactically valid. Review that each ArgoCD Application points at the correct chart path, values file, and target namespace. Ask the user if questions arise.
|
||
|
||
- [x] 5. Create Kargo manifests
|
||
- [x] 5.1 Create `~/sources/kube/pipelines/kargo/values.yaml` — Helm values for `oci://ghcr.io/akuity/kargo-charts/kargo` in `kargo` namespace, with Traefik ingress at `stonks-kargo.celestium.life`, TLS via `ca-issuer`, NFS PVC for persistence
|
||
- _Requirements: 1.1, 1.4, 16.6_
|
||
- [x] 5.2 Create `~/sources/kube/pipelines/kargo/project.yaml` — Kargo Project resource `stonks-oracle` in `stonks-oracle` namespace
|
||
- _Requirements: 8.2, 14.1_
|
||
- [x] 5.3 Create `~/sources/kube/pipelines/kargo/warehouse.yaml` — Kargo Warehouse `stonks-images` watching `ghcr.io/celesrenata/stonks-oracle/query-api` for new image tags
|
||
- _Requirements: 6.5, 14.1_
|
||
- [x] 5.4 Create `~/sources/kube/pipelines/kargo/stages/beta.yaml` — Kargo Stage for beta with auto-promotion enabled, promotion template that updates `image.tag` in the `stonks-beta` ArgoCD Application
|
||
- _Requirements: 8.1, 8.3, 13.1_
|
||
- [x] 5.5 Create `~/sources/kube/pipelines/kargo/stages/paper.yaml` — Kargo Stage for paper with manual promotion, market-hours verification step (AnalysisTemplate), promotion template that updates `image.tag` in the `stonks-paper` ArgoCD Application
|
||
- _Requirements: 9.1, 9.3, 9.4, 11.1, 11.2, 13.1_
|
||
- [x] 5.6 Create `~/sources/kube/pipelines/kargo/stages/live.yaml` — Kargo Stage for live with manual approval + required notes, market-hours verification step, promotion template that updates `image.tag` in the `stonks-live` ArgoCD Application
|
||
- _Requirements: 10.1, 10.3, 10.4, 11.1, 11.2, 12.1, 12.3, 13.1_
|
||
- [x] 5.7 Create `~/sources/kube/pipelines/kargo/project-config.yaml` — Kargo ProjectConfig with per-stage `autoPromotionEnabled` settings (beta: true, paper: false, live: false)
|
||
- _Requirements: 13.1, 13.2, 13.3_
|
||
|
||
- [x] 6. Create market-hours AnalysisTemplate
|
||
- [x] 6.1 Create the AnalysisTemplate manifest for market-hours verification — runs an Alpine container that checks Eastern Time (09:30–16:00 ET, Mon–Fri), exits 0 outside market hours, exits 1 during market hours. Uses `America/New_York` timezone for DST correctness. Place in `~/sources/kube/pipelines/kargo/` directory.
|
||
- _Requirements: 11.1, 11.2, 11.4_
|
||
|
||
- [x] 7. Checkpoint — Verify Kargo manifests and promotion DAG
|
||
- Ensure Kargo stages form the correct linear DAG: beta → paper → live. Verify market-hours AnalysisTemplate is referenced by paper and live stages. Ensure all YAML is syntactically valid. Ask the user if questions arise.
|
||
|
||
- [x] 8. Create Helm values files for beta and paper stages (in stonks-oracle repo)
|
||
- [x] 8.1 Create `infra/helm/stonks-oracle/values-beta.yaml` — lighter resources, `BROKER_MODE: mock`, `BROKER_PROVIDER: mock`, `LOG_LEVEL: DEBUG`, `TRADING_ENABLED: false`, single replicas per service
|
||
- _Requirements: 8.4, 9.2_
|
||
- [x] 8.2 Create `infra/helm/stonks-oracle/values-paper.yaml` — paper broker config, `BROKER_MODE: paper`, `BROKER_PROVIDER: alpaca`, `BROKER_BASE_URL: https://paper-api.alpaca.markets`, `LOG_LEVEL: INFO`, `TRADING_ENABLED: true`
|
||
- _Requirements: 9.2, 9.5_
|
||
|
||
- [x] 9. Update GitHub Actions workflow (in stonks-oracle repo)
|
||
- [x] 9.1 Update `.github/workflows/build.yml` — change `runs-on: ubuntu-latest` to `runs-on: self-hosted-gremlin` on all jobs (`lint-and-test`, `build-services`, `build-dashboard`, `build-superset`)
|
||
- _Requirements: 5.1, 4.2_
|
||
- [x] 9.2 Add `integration-test` job to `.github/workflows/build.yml` — depends on `build-services` and `build-dashboard`, runs only on push to main, invokes `bash infra/inttest/run_pipeline.sh --image-tag ${{ github.sha }} --results-file inttest-results.json`, uploads `inttest-results.json` as a build artifact via `actions/upload-artifact@v4`
|
||
- _Requirements: 7.1, 7.2, 7.3, 7.4, 7.5_
|
||
|
||
- [x] 10. Checkpoint — Verify workflow and values files
|
||
- Ensure the updated workflow YAML is syntactically valid. Verify the integration-test job has correct `needs`, `if` condition, and artifact upload. Confirm values-beta.yaml and values-paper.yaml are valid Helm values. Ask the user if questions arise.
|
||
|
||
- [x] 11. Create install and teardown scripts
|
||
- [x] 11.1 Create `~/sources/kube/pipelines/runmefirst.sh` — full install script: create namespaces (`arc-system`, `argocd`, `kargo`, `stonks-beta`, `stonks-paper`), apply PVs, install ARC controller via Helm, apply runner scaleset, install ArgoCD via Helm with values, apply repo secret + ArgoCD Applications, install Kargo via Helm with values, apply Kargo project + warehouse + stages. Use `set -euo pipefail`, idempotent namespace creation via `--dry-run=client -o yaml | kubectl apply -f -`
|
||
- _Requirements: 1.1, 1.2, 1.5, 3.1_
|
||
- [x] 11.2 Create `~/sources/kube/pipelines/runmelast.sh` — teardown script: delete Kargo resources (stages, warehouse, project-config, project), uninstall Kargo Helm release, delete ArgoCD resources (apps, repo-secret), uninstall ArgoCD Helm release, delete ARC resources (runner-scaleset), uninstall ARC Helm release, delete namespaces (`arc-system`, `argocd`, `kargo`). Preserve PVs, NFS data, `stonks-oracle` namespace, `stonks-beta`, and `stonks-paper` namespaces. Use `--ignore-not-found` and `|| true` for idempotency.
|
||
- _Requirements: 2.1, 2.2, 2.3, 2.4, 3.2, 3.3, 17.2_
|
||
|
||
- [x] 12. Final checkpoint — Review all artifacts
|
||
- Ensure all files are created in the correct locations: pipeline scripts in `~/sources/kube/pipelines/`, Helm values and workflow changes in the stonks-oracle repo. Verify install order in `runmefirst.sh` matches design (PVs → ARC → ArgoCD → Kargo). Verify teardown order in `runmelast.sh` is reverse (Kargo → ArgoCD → ARC). Ensure all tests pass, ask the user if questions arise.
|
||
|
||
## Notes
|
||
|
||
- Pipeline infrastructure scripts (`~/sources/kube/pipelines/`) are created on gremlin-1, separate from the stonks-oracle repo
|
||
- Helm values files (`values-beta.yaml`, `values-paper.yaml`) and the GitHub Actions workflow update are in the stonks-oracle repo
|
||
- No property-based tests — this feature is entirely IaC (shell scripts, YAML manifests, Helm values)
|
||
- The existing `values.yaml` (production) is not modified — live stage uses it as-is
|
||
- PVs use `persistentVolumeReclaimPolicy: Retain` so NFS data survives teardowns
|
||
- Break-glass is Kargo's built-in manual approval — no custom code needed (Requirements 12.1–12.5)
|
||
- Audit trail is provided by Kargo's native promotion history (Requirements 15.1–15.4)
|
||
- Kargo Dashboard features (stage display, promotion controls, block indicators) are provided by the Kargo chart out of the box (Requirements 14.1–14.3, 16.1–16.5)
|
||
- Each task references specific requirements for traceability
|
||
- Checkpoints ensure incremental validation between major phases
|