Production-ready GitOps reference architecture demonstrating modern Kubernetes deployment patterns with progressive delivery, policy enforcement, and observability.
flowchart TB subgraph Developer["Developer Workflow"] DEV[Developer] -->|push code| GIT[GitHub] GIT -->|trigger| CI[GitHub Actions] CI -->|build & push| ECR[ECR Registry] CI -->|update| VALUES[Helm Values] end subgraph GitOps["GitOps Control Plane"] VALUES -->|watch| ARGO[ArgoCD] ARGO -->|sync| K8S[Kubernetes Cluster] end subgraph Cluster["EKS Cluster"] K8S --> ROLLOUT[Argo Rollouts] ROLLOUT -->|canary 10%| CANARY[Canary Pods] ROLLOUT -->|stable 90%| STABLE[Stable Pods] end subgraph Observability["Observability"] K8S --> PROM[Prometheus] PROM --> GRAF[Grafana] PROM -->|analysis| ROLLOUT end Full architecture documentation →
| Capability | Implementation | Location |
|---|---|---|
| GitOps Workflow | ArgoCD with App of Apps pattern | argocd/ |
| Progressive Delivery | Argo Rollouts with canary analysis | rollouts/ |
| Infrastructure as Code | Terraform modules for EKS | terraform/ |
| Policy as Code | Kyverno admission policies | policies/ |
| Observability | Prometheus, Grafana, AlertManager | observability/ |
| Secret Management | External Secrets Operator + AWS SM | secrets/ |
| Multi-Environment | Dev → Staging → Prod promotion | argocd/applicationsets/ |
├── terraform/ # Infrastructure provisioning │ ├── modules/ │ │ ├── eks/ # EKS cluster module │ │ ├── vpc/ # VPC networking │ │ └── argocd/ # ArgoCD bootstrap │ └── environments/ │ └── dev/ # Environment configs ├── argocd/ # ArgoCD application definitions │ ├── apps/ # Application manifests │ ├── projects/ # ArgoCD projects (RBAC) │ └── applicationsets/ # Dynamic multi-env generation ├── helm/ # Helm charts │ └── sample-app/ # Example application chart ├── rollouts/ # Argo Rollouts strategies │ └── canary-strategy.yaml # Canary with Prometheus analysis ├── policies/ # Kyverno policies │ └── kyverno/ # Security & best practice policies ├── observability/ # Monitoring stack │ └── prometheus/ # Prometheus, Grafana, alerts ├── secrets/ # External Secrets configuration ├── docs/ # Documentation │ ├── architecture.md # System architecture │ └── adr/ # Architecture Decision Records └── .github/ └── workflows/ # CI/CD pipelines - AWS CLI configured with appropriate credentials
- Terraform >= 1.6
- kubectl >= 1.28
- Helm >= 3.13
- ArgoCD CLI (optional)
cd terraform/environments/dev terraform init terraform plan terraform applyaws eks update-kubeconfig --name gitops-demo-dev --region us-west-2# Get initial admin password kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d # Port forward kubectl port-forward svc/argocd-server -n argocd 8080:443See Repository Structure above for detailed organization.
Module-specific variables are documented in each module's variables.tf:
terraform/modules/vpc/variables.tf- VPC configurationterraform/modules/eks/variables.tf- EKS cluster settingsterraform/modules/argocd/variables.tf- ArgoCD bootstrap
Module outputs are defined in respective outputs.tf files:
- Cluster endpoint and certificate authority
- VPC and subnet IDs
- ArgoCD initial admin password
Format Terraform code:
terraform fmt -recursiveValidate configuration:
terraform validateRun linting:
tflintCI/CD pipeline includes:
- Terraform validation and formatting checks
- Security scanning with tfsec
- Kubernetes manifest validation
- Container image scanning with Trivy
This repo demonstrates canary deployments with automated analysis:
strategy: canary: steps: - setWeight: 10 # 10% traffic to canary - pause: {duration: 2m} - analysis: # Check error rate via Prometheus templates: - templateName: success-rate - setWeight: 50 # Promote to 50% - setWeight: 100 # Full rolloutAutomatic rollback triggers when:
- Error rate > 1%
- P99 latency > 500ms
- Pod restarts detected
See full rollout configuration →
Kyverno policies enforce security and best practices:
| Policy | Enforcement | Description |
|---|---|---|
require-labels | Enforce | Standard labels for all workloads |
require-resource-limits | Enforce | CPU/memory limits required |
disallow-privileged | Enforce | No privileged containers |
require-probes | Audit | Liveness/readiness probes |
Pre-configured alerting for GitOps workflows:
- ArgoCD App Out of Sync (>15 min)
- ArgoCD App Health Degraded
- Rollout Stalled (>30 min)
- High Error Rate (>1%)
- High Latency (P99 > 500ms)
Key decisions documented:
- ADR-001: Why ArgoCD for GitOps
- ADR-002: Progressive Delivery with Argo Rollouts
- ADR-003: External Secrets Operator for Secrets
- RBAC: Fine-grained access control for ArgoCD projects
- Network Policies: Namespace isolation and traffic control
- External Secrets: AWS Secrets Manager integration (no secrets in Git)
- Kyverno Policies: Admission control for security standards
- Image Scanning: Trivy integration in CI pipeline
Thomas Vincent — Senior DevOps Engineer
- GitHub: @thomasvincent
- LinkedIn: thomasvincent
- Email: thomasvincent@gmail.com
MIT License - see LICENSE for details.