520 lines
27 KiB
Markdown
520 lines
27 KiB
Markdown
# Local CI/CD Pipeline — Design
|
||
|
||
## Overview
|
||
|
||
This design replaces the GitHub-dependent CI/CD pipeline (ARC + GHCR) with a fully local pipeline using Gitea as the Git forge, Woodpecker CI for pipeline execution, and the existing local Docker registry at `registry.celestium.life` for image storage. The existing ArgoCD and Kargo infrastructure is retained for GitOps deployment and staged promotion, with configuration updates to point at local sources instead of GitHub/GHCR.
|
||
|
||
The migration touches five areas:
|
||
|
||
1. **Gitea configuration** — Complete initial setup (admin user, OAuth2 app), create the `stonks-oracle` repository, and configure webhooks for Woodpecker CI. Gitea is already deployed in the `git-server` namespace but unconfigured.
|
||
2. **Woodpecker CI deployment** — Deploy server and agent via the `woodpecker/woodpecker` Helm chart in the `woodpecker` namespace. The server authenticates with Gitea via OAuth2. The agent uses the Kubernetes backend, executing each pipeline step as a standalone Pod.
|
||
3. **Pipeline file** — Create `.woodpecker.yml` translating the existing GitHub Actions workflow into Woodpecker's native format, targeting the local registry and adding a GitHub mirror step.
|
||
4. **ArgoCD/Kargo updates** — Update ArgoCD repo secret to point at Gitea, update ArgoCD Applications to source from Gitea, update Kargo Warehouse to watch the local registry.
|
||
5. **ARC teardown** — Remove ARC controller, runner scale set, RBAC, PV, and `arc-system` namespace.
|
||
|
||
### Key Design Decisions
|
||
|
||
1. **Woodpecker with Kubernetes backend (not Docker-in-Docker agent)** — The Woodpecker agent uses `WOODPECKER_BACKEND: kubernetes`, executing each pipeline step as a standalone Pod in the `woodpecker` namespace. A temporary PVC is created per pipeline run to transfer files between steps. This avoids DinD complexity for most steps. Image builds use the `woodpeckerci/plugin-docker-buildx` plugin with privileged mode for the build step only.
|
||
|
||
2. **Gitea API for initial setup** — Gitea's initial setup (admin user creation, OAuth2 app registration, repo creation) is automated via Gitea's REST API in `runmefirst.sh`. This avoids manual web UI interaction and makes the setup reproducible.
|
||
|
||
3. **Single Helm chart for Woodpecker** — The `woodpecker/woodpecker` chart contains both server and agent subcharts. One `helm install` deploys both components. The agent connects to the server via the in-cluster service `woodpecker-server:9000`.
|
||
|
||
4. **NFS PV for Woodpecker** — Woodpecker server data (SQLite database, build logs) persists on an NFS volume at `nfs://192.168.42.8:/volume1/Kubernetes/pipelines/woodpecker`, surviving cluster rebuilds. The ARC PV is removed since ARC is being torn down.
|
||
|
||
5. **GitHub as read-only mirror** — After all CI steps pass, a final pipeline step pushes to GitHub via SSH key stored as a Woodpecker secret. GitHub mirror failure does not block image promotion or deployment.
|
||
|
||
6. **ArgoCD sources from Gitea** — ArgoCD's repo secret is updated to point at the Gitea repository URL. All three Applications (beta, paper, live) source Helm charts from Gitea instead of GitHub.
|
||
|
||
7. **Helm chart image registry update** — The base `values.yaml` changes `image.registry` from `ghcr.io/celesrenata/stonks-oracle` to `registry.celestium.life/stonks-oracle`. The `ghcrAuth` section and `ghcr-credentials` imagePullSecret are removed since the local registry requires no authentication.
|
||
|
||
## Architecture
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||
│ Gremlin Cluster (4x NixOS) │
|
||
│ │
|
||
│ ┌─────────────────┐ ┌──────────────────┐ ┌───────────────────────────┐ │
|
||
│ │ git-server ns │ │ woodpecker ns │ │ argocd ns │ │
|
||
│ │ (pre-existing) │ │ (NEW) │ │ (existing, updated) │ │
|
||
│ │ │ │ │ │ │ │
|
||
│ │ Gitea │ │ WP Server │ │ ArgoCD Server │ │
|
||
│ │ 10.1.1.x:30300 │ │ (StatefulSet) │ │ (stonks-argocd. │ │
|
||
│ │ :30022 (SSH) │ │ stonks-ci. │ │ celestium.life) │ │
|
||
│ │ │ │ celestium.life │ │ │ │
|
||
│ │ Local Registry │ │ │ │ Repo: Gitea (updated) │ │
|
||
│ │ registry. │ │ WP Agent │ │ │ │
|
||
│ │ celestium.life │ │ (Deployment) │ │ │ │
|
||
│ │ :30500 │ │ K8s backend │ │ │ │
|
||
│ └─────────────────┘ └──────────────────┘ └───────────────────────────┘ │
|
||
│ │
|
||
│ ┌─────────────────┐ ┌──────────────────┐ ┌───────────────────────────┐ │
|
||
│ │ kargo ns │ │ stonks-beta ns │ │ stonks-oracle ns │ │
|
||
│ │ (existing, │ │ │ │ (live/production) │ │
|
||
│ │ updated) │ │ ArgoCD App: │ │ │ │
|
||
│ │ │ │ stonks-beta │ │ ArgoCD App: stonks-live │ │
|
||
│ │ Warehouse: │ │ images from │ │ images from │ │
|
||
│ │ local registry │ │ local registry │ │ local registry │ │
|
||
│ │ (updated) │ │ │ │ │ │
|
||
│ └─────────────────┘ └──────────────────┘ └───────────────────────────┘ │
|
||
│ │
|
||
│ NFS: nfs://192.168.42.8:/volume1/Kubernetes/pipelines/{argocd,kargo,woodpecker}│
|
||
└─────────────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Pipeline Flow
|
||
|
||
```mermaid
|
||
graph LR
|
||
A[Git Push to Gitea] --> B[Webhook → Woodpecker CI]
|
||
B --> C[Lint + Test<br/>Python ruff + pytest<br/>Frontend vitest]
|
||
C --> D[Build + Push<br/>all images to<br/>local registry]
|
||
D --> E[Integration Tests<br/>run_pipeline.sh]
|
||
E -->|pass| F[GitHub Mirror<br/>git push]
|
||
E -->|fail| X[❌ Pipeline Failed]
|
||
F --> G[Kargo Warehouse<br/>detects new tag<br/>in local registry]
|
||
G --> H[Beta Stage<br/>auto-promote]
|
||
H --> I{Market Hours?}
|
||
I -->|outside| J[Paper Stage]
|
||
I -->|during| K[🚫 Blocked]
|
||
K -->|break-glass| J
|
||
J --> L{Market Hours?}
|
||
L -->|outside| M[Live Stage<br/>manual approval]
|
||
L -->|during| N[🚫 Blocked]
|
||
N -->|break-glass| M
|
||
```
|
||
|
||
## Components and Interfaces
|
||
|
||
### 1. Gitea Configuration (`pipelines/gitea/`)
|
||
|
||
Gitea is already deployed in the `git-server` namespace but needs initial setup. The configuration is automated via shell scripts that call Gitea's REST API.
|
||
|
||
**Setup Steps (in `runmefirst.sh`):**
|
||
|
||
1. **Complete initial setup** — POST to `http://<gitea-svc>:3000/` with admin credentials to complete the install wizard, or use the Gitea API to create the admin user if the instance is already initialized.
|
||
2. **Create OAuth2 application** — POST to `/api/v1/user/applications/oauth2` to register Woodpecker CI with callback URL `https://stonks-ci.celestium.life/authorize`. Store the returned `client_id` and `client_secret` for Woodpecker's Helm values.
|
||
3. **Create repository** — POST to `/api/v1/user/repos` to create `stonks-oracle` repository.
|
||
4. **Add Gitea remote to local repo** — Configure the local Git clone on gremlin-1 with the Gitea remote and push the existing codebase.
|
||
|
||
**Gitea Service Access:**
|
||
- Web UI: `http://gitea-http.git-server.svc.cluster.local:3000` (cluster-internal) / `10.1.1.x:30300` (NodePort)
|
||
- SSH: `:30022` (NodePort)
|
||
- API: `http://gitea-http.git-server.svc.cluster.local:3000/api/v1/`
|
||
|
||
**Webhook Configuration:**
|
||
Woodpecker CI automatically registers webhooks when a repository is activated through the Woodpecker dashboard or API. The webhook URL points to the Woodpecker server's internal service endpoint.
|
||
|
||
### 2. Woodpecker CI Server and Agent (`pipelines/woodpecker/`)
|
||
|
||
**Namespace:** `woodpecker`
|
||
|
||
**Helm Chart:** `woodpecker/woodpecker` from the [woodpecker-ci/helm](https://github.com/woodpecker-ci/helm) repository. Contains two subcharts: `server` and `agent`.
|
||
|
||
**Server Configuration:**
|
||
- StatefulSet with 1 replica
|
||
- Persistent volume for SQLite database and build data at `/var/lib/woodpecker`
|
||
- NFS-backed PV at `nfs://192.168.42.8:/volume1/Kubernetes/pipelines/woodpecker`
|
||
- Traefik ingress at `stonks-ci.celestium.life` with TLS via `ca-issuer`
|
||
- Gitea OAuth2 authentication via `WOODPECKER_GITEA=true`, `WOODPECKER_GITEA_URL`, `WOODPECKER_GITEA_CLIENT`, `WOODPECKER_GITEA_SECRET`
|
||
- `WOODPECKER_HOST=https://stonks-ci.celestium.life`
|
||
- `WOODPECKER_ADMIN=admin` (matches Gitea admin username)
|
||
|
||
**Agent Configuration:**
|
||
- Deployment with 2 replicas
|
||
- Kubernetes backend (`WOODPECKER_BACKEND: kubernetes`)
|
||
- Pipeline steps execute as standalone Pods in the `woodpecker` namespace
|
||
- Temporary PVC created per pipeline run for file transfer between steps
|
||
- `WOODPECKER_BACKEND_K8S_STORAGE_CLASS: ""` (use default)
|
||
- `WOODPECKER_BACKEND_K8S_VOLUME_SIZE: 10G`
|
||
- ServiceAccount with RBAC for creating Pods, Services, PVCs in the `woodpecker` namespace
|
||
- Additional ClusterRoleBinding for integration test steps that need to create ephemeral namespaces
|
||
|
||
**Helm Values Structure (`pipelines/woodpecker/values.yaml`):**
|
||
```yaml
|
||
server:
|
||
enabled: true
|
||
env:
|
||
WOODPECKER_HOST: "https://stonks-ci.celestium.life"
|
||
WOODPECKER_GITEA: "true"
|
||
WOODPECKER_GITEA_URL: "http://gitea-http.git-server.svc.cluster.local:3000"
|
||
WOODPECKER_GITEA_CLIENT: "<from-oauth2-setup>"
|
||
WOODPECKER_GITEA_SECRET: "<from-oauth2-setup>"
|
||
WOODPECKER_ADMIN: "admin"
|
||
ingress:
|
||
enabled: true
|
||
ingressClassName: traefik
|
||
hosts:
|
||
- host: stonks-ci.celestium.life
|
||
paths:
|
||
- path: /
|
||
backend:
|
||
serviceName: woodpecker-server
|
||
servicePort: 80
|
||
tls:
|
||
- secretName: woodpecker-tls
|
||
hosts:
|
||
- stonks-ci.celestium.life
|
||
annotations:
|
||
cert-manager.io/cluster-issuer: ca-issuer
|
||
persistentVolume:
|
||
enabled: true
|
||
size: 5Gi
|
||
storageClass: ""
|
||
|
||
agent:
|
||
enabled: true
|
||
replicaCount: 2
|
||
env:
|
||
WOODPECKER_SERVER: "woodpecker-server:9000"
|
||
WOODPECKER_BACKEND: kubernetes
|
||
WOODPECKER_BACKEND_K8S_NAMESPACE: woodpecker
|
||
WOODPECKER_BACKEND_K8S_VOLUME_SIZE: 10G
|
||
WOODPECKER_BACKEND_K8S_STORAGE_RWX: "true"
|
||
```
|
||
|
||
**Network Policy:**
|
||
A NetworkPolicy in the `woodpecker` namespace allows Traefik ingress traffic to the Woodpecker server on its HTTP port (80).
|
||
|
||
### 3. Woodpecker Pipeline File (`.woodpecker.yml`)
|
||
|
||
The pipeline file translates the existing GitHub Actions workflow into Woodpecker's native format. Each step runs as a Docker container.
|
||
|
||
**Pipeline Structure:**
|
||
|
||
```
|
||
.woodpecker.yml
|
||
├── lint-python (ruff check services/)
|
||
├── test-python (pytest tests/)
|
||
├── test-frontend (npm ci && npx vitest --run)
|
||
├── build-<service> (×12 Python services, sequential or grouped)
|
||
├── build-dashboard (frontend/Dockerfile)
|
||
├── build-superset (docker/Dockerfile.superset)
|
||
├── integration-test (run_pipeline.sh)
|
||
└── mirror-github (git push to GitHub)
|
||
```
|
||
|
||
**Key Differences from GitHub Actions:**
|
||
- No `uses:` syntax — each step specifies an `image:` and `commands:` or uses a Woodpecker plugin
|
||
- Image builds use `woodpeckerci/plugin-docker-buildx` plugin with `settings.repo`, `settings.registry`, `settings.tags`
|
||
- Branch filtering via `when: { branch: main, event: push }` instead of GitHub's `if:` conditions
|
||
- Secrets referenced via `from_secret:` instead of `${{ secrets.X }}`
|
||
- No matrix builds in Woodpecker — services are built sequentially or via multiple steps
|
||
|
||
**Image Tagging:**
|
||
All images pushed to `registry.celestium.life/stonks-oracle/<service>:<sha>` and `registry.celestium.life/stonks-oracle/<service>:latest`.
|
||
|
||
**GitHub Mirror Step:**
|
||
Uses the `woodpeckerci/plugin-git-push` plugin or a custom step with `git push --mirror` using an SSH deploy key stored as a Woodpecker secret.
|
||
|
||
### 4. ArgoCD Updates
|
||
|
||
**Repo Secret Update (`pipelines/argocd/repo-secret.yaml`):**
|
||
Change the repository URL from GitHub to Gitea:
|
||
```yaml
|
||
stringData:
|
||
url: http://gitea-http.git-server.svc.cluster.local:3000/admin/stonks-oracle.git
|
||
type: git
|
||
username: admin
|
||
password: <gitea-admin-password>
|
||
```
|
||
|
||
**Application Updates (`pipelines/argocd/apps/*.yaml`):**
|
||
All three Applications (stonks-beta, stonks-paper, stonks-live) update `spec.source.repoURL` from `https://github.com/celesrenata/stonks-oracle.git` to the Gitea repository URL.
|
||
|
||
### 5. Kargo Warehouse Update
|
||
|
||
**Warehouse Update (`pipelines/kargo/warehouse.yaml`):**
|
||
Change the image subscription from GHCR to the local registry:
|
||
```yaml
|
||
spec:
|
||
subscriptions:
|
||
- image:
|
||
repoURL: registry.celestium.life/stonks-oracle/query-api
|
||
```
|
||
|
||
Kargo stages, project, project-config, and market-hours AnalysisTemplate remain unchanged.
|
||
|
||
### 6. Helm Chart Updates (`infra/helm/stonks-oracle/`)
|
||
|
||
**`values.yaml` changes:**
|
||
```yaml
|
||
image:
|
||
registry: registry.celestium.life/stonks-oracle # was: ghcr.io/celesrenata/stonks-oracle
|
||
pullPolicy: Always
|
||
tag: latest
|
||
|
||
# REMOVED: imagePullSecrets, ghcrAuth sections
|
||
```
|
||
|
||
**`values-beta.yaml` and `values-paper.yaml`:**
|
||
No changes needed — they inherit `image.registry` from the base `values.yaml` and only override `image.tag`.
|
||
|
||
### 7. ARC Teardown
|
||
|
||
The `runmefirst.sh` script tears down ARC before installing Woodpecker:
|
||
|
||
1. `helm uninstall arc-runner-set --namespace arc-system || true`
|
||
2. `helm uninstall arc --namespace arc-system || true`
|
||
3. `kubectl delete -f arc/runner-rbac.yaml --ignore-not-found`
|
||
4. `kubectl delete pv pipeline-arc-pv --ignore-not-found`
|
||
5. `kubectl delete namespace arc-system --ignore-not-found`
|
||
|
||
The `pipelines/arc/` directory and `pipelines/pvs/arc-pv.yaml` are removed from the repo.
|
||
|
||
### 8. NFS Persistent Volumes
|
||
|
||
**Updated PV set** (ARC PV removed, Woodpecker PV added):
|
||
|
||
| PV Name | NFS Path | Capacity | Bound To |
|
||
|---|---|---|---|
|
||
| `pipeline-argocd-pv` | `/volume1/Kubernetes/pipelines/argocd` | 5Gi | PVC in `argocd` ns |
|
||
| `pipeline-kargo-pv` | `/volume1/Kubernetes/pipelines/kargo` | 2Gi | PVC in `kargo` ns |
|
||
| `pipeline-woodpecker-pv` | `/volume1/Kubernetes/pipelines/woodpecker` | 5Gi | PVC in `woodpecker` ns |
|
||
|
||
### 9. Updated `runmefirst.sh`
|
||
|
||
```
|
||
#!/bin/bash
|
||
set -euo pipefail
|
||
|
||
# 1. Tear down ARC (if present)
|
||
# - Uninstall ARC Helm releases
|
||
# - Delete RBAC, PV, namespace
|
||
|
||
# 2. Create namespaces (woodpecker, argocd, kargo, stonks-beta, stonks-paper)
|
||
|
||
# 3. Create NFS PVs (argocd, kargo, woodpecker)
|
||
|
||
# 4. Configure Gitea
|
||
# - Complete initial setup via API
|
||
# - Create admin user (if needed)
|
||
# - Create OAuth2 app for Woodpecker
|
||
# - Create stonks-oracle repository
|
||
|
||
# 5. Install Woodpecker CI via Helm
|
||
# - Inject Gitea OAuth2 client_id and client_secret into values
|
||
# - Apply NetworkPolicy for Traefik ingress
|
||
|
||
# 6. Install ArgoCD via Helm
|
||
# - Apply updated repo secret (pointing to Gitea)
|
||
# - Apply ArgoCD Applications
|
||
|
||
# 7. Install Kargo via Helm
|
||
# - Apply project, project-config, warehouse (local registry), stages
|
||
|
||
# 8. Apply Woodpecker agent RBAC for integration tests
|
||
```
|
||
|
||
### 10. Updated `runmelast.sh`
|
||
|
||
```
|
||
#!/bin/bash
|
||
set -euo pipefail
|
||
|
||
# Reverse order: Kargo → ArgoCD → Woodpecker
|
||
# Preserves: PVs, NFS data, git-server namespace (Gitea + registry)
|
||
|
||
# 1. Remove Kargo resources + Helm release
|
||
# 2. Remove ArgoCD resources + Helm release
|
||
# 3. Remove Woodpecker Helm release
|
||
# 4. Delete namespaces (woodpecker, argocd, kargo)
|
||
# 5. PVs intentionally NOT deleted
|
||
```
|
||
|
||
### 11. Woodpecker Agent RBAC
|
||
|
||
The Woodpecker agent's service account needs:
|
||
- **Namespace-scoped RBAC** (auto-created by Helm chart): Create/delete Pods, Services, PVCs in the `woodpecker` namespace for pipeline step execution.
|
||
- **ClusterRoleBinding** (manually applied): Grant the agent service account `cluster-admin` for integration test steps that create ephemeral namespaces and deploy sandbox infrastructure. This mirrors the existing ARC runner RBAC pattern.
|
||
|
||
```yaml
|
||
apiVersion: rbac.authorization.k8s.io/v1
|
||
kind: ClusterRoleBinding
|
||
metadata:
|
||
name: woodpecker-agent-inttest
|
||
roleRef:
|
||
apiGroup: rbac.authorization.k8s.io
|
||
kind: ClusterRole
|
||
name: cluster-admin
|
||
subjects:
|
||
- kind: ServiceAccount
|
||
name: woodpecker-agent
|
||
namespace: woodpecker
|
||
```
|
||
|
||
## Data Models
|
||
|
||
### Pipeline Infrastructure Layout
|
||
|
||
```
|
||
~/sources/kube/pipelines/
|
||
├── runmefirst.sh # Full install: ARC teardown → Gitea config → Woodpecker → ArgoCD → Kargo
|
||
├── runmelast.sh # Teardown: Kargo → ArgoCD → Woodpecker (preserves PVs, git-server)
|
||
├── gitea/
|
||
│ └── setup.sh # Gitea API setup: admin user, OAuth2 app, repo creation
|
||
├── woodpecker/
|
||
│ ├── values.yaml # Woodpecker Helm values (server + agent)
|
||
│ └── network-policy.yaml # NetworkPolicy for Traefik → Woodpecker server
|
||
│ └── agent-rbac.yaml # ClusterRoleBinding for integration test access
|
||
├── argocd/
|
||
│ ├── values.yaml # ArgoCD Helm values (unchanged)
|
||
│ ├── repo-secret.yaml # Updated: points to Gitea instead of GitHub
|
||
│ └── apps/
|
||
│ ├── stonks-beta.yaml # Updated: repoURL → Gitea
|
||
│ ├── stonks-paper.yaml # Updated: repoURL → Gitea
|
||
│ └── stonks-live.yaml # Updated: repoURL → Gitea
|
||
├── kargo/
|
||
│ ├── values.yaml # Kargo Helm values (unchanged)
|
||
│ ├── project.yaml # Kargo Project (unchanged)
|
||
│ ├── project-config.yaml # Kargo ProjectConfig (unchanged)
|
||
│ ├── warehouse.yaml # Updated: watches local registry
|
||
│ ├── market-hours-check.yaml # AnalysisTemplate (unchanged)
|
||
│ └── stages/
|
||
│ ├── beta.yaml # Kargo Stage (unchanged)
|
||
│ ├── paper.yaml # Kargo Stage (unchanged)
|
||
│ └── live.yaml # Kargo Stage (unchanged)
|
||
└── pvs/
|
||
├── argocd-pv.yaml # NFS PV for ArgoCD (unchanged)
|
||
├── kargo-pv.yaml # NFS PV for Kargo (unchanged)
|
||
└── woodpecker-pv.yaml # NFS PV for Woodpecker (NEW, replaces arc-pv.yaml)
|
||
```
|
||
|
||
### Removed Files
|
||
|
||
```
|
||
pipelines/arc/ # Entire directory removed
|
||
├── values.yaml
|
||
├── runner-scaleset.yaml
|
||
└── runner-rbac.yaml
|
||
pipelines/pvs/arc-pv.yaml # ARC PV removed
|
||
```
|
||
|
||
### Image Tag Flow (Updated)
|
||
|
||
```
|
||
Git SHA (e.g., abc123)
|
||
→ Woodpecker builds: registry.celestium.life/stonks-oracle/<service>:abc123
|
||
→ Integration test: run_pipeline.sh --image-tag abc123
|
||
→ GitHub mirror: git push (non-blocking)
|
||
→ Kargo Warehouse detects: abc123 in local registry
|
||
→ Kargo Freight created: abc123
|
||
→ Beta: helm upgrade with image.tag=abc123
|
||
→ Paper: helm upgrade with image.tag=abc123 (after market-hours check)
|
||
→ Live: helm upgrade with image.tag=abc123 (after approval + market-hours check)
|
||
```
|
||
|
||
### Kargo Resource Relationships (Updated)
|
||
|
||
```mermaid
|
||
graph TD
|
||
W[Warehouse: stonks-images<br/>watches LOCAL REGISTRY<br/>registry.celestium.life] -->|produces| F[Freight<br/>image tag = git SHA]
|
||
F -->|auto-promote| SB[Stage: beta<br/>ArgoCD App: stonks-beta]
|
||
SB -->|verified → available| SP[Stage: paper<br/>market-hours verification<br/>ArgoCD App: stonks-paper]
|
||
SP -->|verified → available| SL[Stage: live<br/>manual approval + market-hours<br/>ArgoCD App: stonks-live]
|
||
```
|
||
|
||
## Error Handling
|
||
|
||
### Gitea Setup Failures
|
||
|
||
| Failure | Detection | Recovery |
|
||
|---|---|---|
|
||
| Gitea not reachable | API call returns connection error | Check Gitea pod status in `git-server` namespace. Verify NodePort service. |
|
||
| Admin user already exists | API returns 422 | Script continues — idempotent. |
|
||
| OAuth2 app already exists | API returns 422 | Script queries existing apps and reuses credentials. |
|
||
| Repository already exists | API returns 409 | Script continues — idempotent. |
|
||
|
||
### Woodpecker Deployment Failures
|
||
|
||
| Failure | Detection | Recovery |
|
||
|---|---|---|
|
||
| Helm install fails | Non-zero exit | Check Helm chart repo access. Verify `woodpecker` namespace exists. |
|
||
| Server can't reach Gitea | OAuth2 login fails | Verify `WOODPECKER_GITEA_URL` resolves within cluster. Check Gitea service. |
|
||
| Agent can't connect to server | Agent logs show connection errors | Verify `WOODPECKER_SERVER` env var matches server service name. Check agent secret. |
|
||
| Pipeline step Pod fails to schedule | Pod stuck in Pending | Check node resources. Verify RBAC allows Pod creation in `woodpecker` namespace. |
|
||
| Image build fails (privileged) | Build step exits non-zero | Verify containerd/k3s allows privileged Pods. Check `plugin-docker-buildx` logs. |
|
||
|
||
### Pipeline Failures
|
||
|
||
| Failure | Detection | Recovery |
|
||
|---|---|---|
|
||
| Lint/test fails | Step exits non-zero | Fix code, push again. Build steps are skipped. |
|
||
| Image push to local registry fails | Plugin exits non-zero | Check registry health at `registry.celestium.life`. Verify DNS resolution. |
|
||
| Integration test fails | `run_pipeline.sh` exits non-zero | Check Woodpecker dashboard for step logs. Fix and re-push. |
|
||
| GitHub mirror fails | Mirror step exits non-zero | Non-blocking — images are already in local registry. Fix SSH key and re-run. |
|
||
|
||
### ArgoCD/Kargo Update Failures
|
||
|
||
| Failure | Detection | Recovery |
|
||
|---|---|---|
|
||
| ArgoCD can't clone from Gitea | Application shows "ComparisonError" | Verify repo secret credentials. Check Gitea accessibility from ArgoCD namespace. |
|
||
| Kargo can't reach local registry | Warehouse shows error | Verify `registry.celestium.life` DNS resolves. Check registry pod health. |
|
||
| Image pull fails (k3s nodes) | Pods stuck in ImagePullBackOff | Ensure k3s containerd trusts the local registry. Add registry mirror config if needed. |
|
||
|
||
### Rollback Strategy
|
||
|
||
Same as existing design:
|
||
- **Beta/Paper**: Promote a previous Freight in Kargo to roll back the image tag.
|
||
- **Live**: Same mechanism with manual approval required.
|
||
- **Emergency**: Direct `helm upgrade` with previous image tag.
|
||
|
||
## Testing Strategy
|
||
|
||
### Why Property-Based Testing Does Not Apply
|
||
|
||
This feature is entirely Infrastructure as Code: shell scripts, Kubernetes YAML manifests, Helm values files, and a Woodpecker pipeline YAML file. There are no pure functions, parsers, serializers, or business logic with meaningful input variation. PBT requires universal properties across a wide input space — this feature has fixed configuration values and Kubernetes resource states. Running 100 iterations of "does the Woodpecker ingress have TLS enabled" adds no value over running it once.
|
||
|
||
### Testing Approach
|
||
|
||
The testing strategy uses three tiers:
|
||
|
||
#### Tier 1: Smoke Tests (Configuration Validation)
|
||
|
||
Run locally or in CI without a live cluster.
|
||
|
||
| Test | What It Validates | How |
|
||
|---|---|---|
|
||
| Manifest syntax | All YAML files parse correctly | `kubectl apply --dry-run=client -f <file>` |
|
||
| Helm template rendering | Woodpecker values produce valid K8s resources | `helm template` with values file |
|
||
| Pipeline file syntax | `.woodpecker.yml` is valid | Woodpecker CLI lint or YAML parse |
|
||
| Namespace isolation | Pipeline namespaces distinct from `stonks-oracle` and `git-server` | Grep manifests for namespace fields |
|
||
| NFS path separation | PVs use distinct subdirectories | Inspect PV YAML |
|
||
| Image registry references | All manifests reference `registry.celestium.life` not `ghcr.io` | Grep all YAML for registry URLs |
|
||
| No GHCR auth remnants | `ghcrAuth` and `ghcr-credentials` removed from Helm chart | Grep values.yaml |
|
||
| ArgoCD repo URL | All Applications point to Gitea, not GitHub | Inspect Application YAML |
|
||
| Kargo warehouse URL | Warehouse watches local registry | Inspect warehouse YAML |
|
||
|
||
#### Tier 2: Integration Tests (Live Cluster Verification)
|
||
|
||
Run after `runmefirst.sh` on the Gremlin cluster.
|
||
|
||
| Test | What It Validates | How |
|
||
|---|---|---|
|
||
| Gitea accessible | Web UI responds | `curl http://10.1.1.x:30300` |
|
||
| Gitea repo exists | `stonks-oracle` repo created | Gitea API query |
|
||
| Woodpecker server running | Pods healthy in `woodpecker` namespace | `kubectl get pods -n woodpecker` |
|
||
| Woodpecker dashboard accessible | Web UI responds at `stonks-ci.celestium.life` | `curl -k https://stonks-ci.celestium.life` |
|
||
| Woodpecker OAuth2 works | Login redirects to Gitea | Browser test |
|
||
| ArgoCD accessible | Web UI responds at `stonks-argocd.celestium.life` | `curl -k https://stonks-argocd.celestium.life` |
|
||
| ArgoCD syncs from Gitea | Applications sync successfully | `argocd app get stonks-beta` |
|
||
| Kargo Warehouse | Discovers images from local registry | `kubectl get freight -n stonks-oracle` |
|
||
| Local registry accessible | Registry responds | `curl https://registry.celestium.life/v2/_catalog` |
|
||
| TLS certificates | Ingresses have valid certs from `ca-issuer` | `openssl s_client` or cert-manager status |
|
||
| PV binding | PVCs bound to NFS PVs | `kubectl get pvc -n woodpecker` |
|
||
| ARC removed | No ARC pods, no `arc-system` namespace | `kubectl get ns arc-system` returns NotFound |
|
||
| End-to-end pipeline | Push triggers build, images land in local registry | Push a commit, verify in Woodpecker dashboard |
|
||
| End-to-end promotion | Image flows beta → paper → live | Trigger promotion, verify deployments update |
|
||
| Teardown preservation | After `runmelast.sh`, PVs and NFS data intact | Run teardown, check PVs and NFS mount |
|
||
|
||
#### Tier 3: Market-Hours and Break-Glass Tests
|
||
|
||
Unchanged from existing design — these tests validate Kargo behavior which is not modified.
|
||
|
||
| Test | What It Validates | How |
|
||
|---|---|---|
|
||
| Market-hours block | Promotion blocked during 09:30–16:00 ET | Run AnalysisTemplate during market hours |
|
||
| Market-hours allow | Promotion allowed outside hours | Run AnalysisTemplate outside hours |
|
||
| Break-glass override | Manual approval bypasses block | Use Kargo manual approval during hours |
|
||
| Break-glass audit | Records operator, timestamp, justification | Query Kargo audit trail |
|