Skip to content

Pragmatic GitOps on AWS EKS: Beyond the Hello World Demo

How to build a production-ready Kubernetes delivery platform using Argo CD, Karpenter, and OIDC without the enterprise bloat.

Emeka Okafor
Emeka Okafor
Security Editor · Jun 27, 2026 · 6 min read
Pragmatic GitOps on AWS EKS: Beyond the Hello World Demo

Most DevOps portfolio projects follow a predictable, tired pattern. A developer deploys a single "Hello World" container to a managed Kubernetes cluster, writes a basic README, and calls it a production-ready platform. In reality, production engineering only begins when you have to manage state, enforce strict security boundaries, handle autoscaling at multiple layers, and coordinate deployments without exposing cluster credentials to the outside world.

Building a real-world platform requires making hard architectural trade-offs. While cloud providers push heavy, multi-tenant SaaS blueprints (such as AWS templates featuring Flux and complex workflows), a lean, single-operator GitOps platform built on Amazon EKS is often the most resilient baseline for mid-sized engineering teams.

Analyzing a recent, highly practical implementation based on the Spring PetClinic microservices architecture (a system with seven independent services, service discovery, an API gateway, and distributed tracing) reveals the exact boundaries where Terraform ends, Kubernetes begins, and GitOps takes over.

The Security Boundary: Eliminating Static Credentials

Traditional CI/CD pipelines rely on push-based deployments. A runner in GitHub Actions or GitLab CI executes kubectl apply or helm upgrade at the end of a build. This approach is a security liability. It requires storing long-lived AWS IAM credentials or cluster admin kubeconfig files inside the CI provider's secret store. If the CI provider is breached, your cluster is compromised.

Inverting this model is the core tenet of GitOps. By using Argo CD, the cluster pulls changes from Git, meaning the CI pipeline never needs direct access to the Kubernetes API.

flowchart TD
    Dev[Push to GitHub main] --> GHA[GitHub Actions CI]
    GHA -->|OIDC Role - No Static Keys| ECR[Amazon ECR]
    GHA -->|Bump Image Tag & Commit| Git[Helm Chart in Git]
    Git --> Argo[Argo CD]
    Argo -->|Sync| Cluster[EKS Cluster]
    ECR --> Cluster

To secure the container push to Amazon ECR, the CI runner should use OpenID Connect (OIDC) to assume an IAM role dynamically. This eliminates static AWS access keys entirely. The GitHub Actions runner authenticates to AWS, receives a short-lived token, pushes the newly built Docker image to ECR, and then commits the new image tag back to the Helm repository.

From there, Argo CD, running inside the EKS cluster, detects the Git commit and pulls the new state. The audit trail for every deployment lives entirely within your Git history, not in ephemeral CI logs.

The Terraform-to-Kubernetes Handoff

One of the most common mistakes in platform engineering is letting Terraform manage Kubernetes resources directly. While Terraform has Helm and Kubernetes providers, using them to deploy application-level resources often leads to state corruption, slow plan phases, and dependency cycles during cluster upgrades.

Instead, draw a hard line: Terraform provisions the foundational infrastructure, and GitOps manages everything inside the cluster.

1. The Foundational Infrastructure (Terraform)

Before provisioning the EKS cluster, the remote state backend must be locked down. This means an S3 bucket with versioning, encryption, and public access blocked, paired with a DynamoDB lock table. The Terraform modules should be split cleanly:

  • VPC Module: Configured across multiple availability zones with public and private subnets. It must include the specific subnet tags that the AWS Load Balancer Controller and Karpenter require for auto-discovery.
  • EKS Module: Built on Kubernetes 1.33, utilizing managed node groups for the control plane and enabling IAM Roles for Service Accounts (IRSA) alongside EKS Pod Identity.
  • ECR Module: Configured with one repository per microservice, enforcing image scanning on push to catch vulnerabilities early.

2. The Bootstrap Script

Once Terraform finishes, a bootstrap script should install the core cluster add-ons (such as the AWS Load Balancer Controller, metrics-server, and Argo CD) in the correct dependency order. This handoff ensures that Terraform remains a static infrastructure tool, while Argo CD becomes the controller for all subsequent software deployments.

Packaging Seven Services into One Helm Chart

Managing raw Kubernetes manifests for a microservices application is a maintenance nightmare. Seven services with separate deployments, services, virtual services, and horizontal pod autoscalers result in thousands of lines of duplicated YAML.

Converting these manifests into a single, values-driven Helm chart simplifies configuration. Instead of maintaining seven separate charts, you can define a single template engine and parameterize the differences in a central values.yaml file:

# helm/petclinic/values.yaml
global:
  image:
    registry: 123456789012.dkr.ecr.eu-central-1.amazonaws.com
    tag: "git-sha-placeholder" # Updated automatically by CI

services:
  api-gateway:
    replicaCount: 2
    port: 8080
    resources:
      limits:
        cpu: 500m
        memory: 512Mi
  customers-service:
    replicaCount: 2
    port: 8081
  vets-service:
    replicaCount: 2
    port: 8082

This structure collapses hundreds of lines of duplicated YAML into a single file. When the CI pipeline runs, it only needs to update the global.image.tag value in this single file and commit the change. Argo CD then applies the diff across all seven deployments simultaneously.

Two-Layer Autoscaling: HPA and Karpenter

Autoscaling in a production environment must happen at both the pod level and the node level.

The Horizontal Pod Autoscaler (HPA) monitors CPU and memory metrics via the metrics-server and scales the application pods when load spikes. However, scaling pods is useless if the underlying EC2 nodes are out of capacity.

Traditional setups rely on the Kubernetes Cluster Autoscaler, which interacts with AWS Auto Scaling Groups (ASGs). This is slow and inefficient because ASGs are not Kubernetes-aware.

Karpenter replaces the Cluster Autoscaler by bypassing ASGs entirely. It talks directly to the EC2 fleet API, evaluating the unschedulable pods and launching the optimal EC2 instance type in under a minute.

To set up Karpenter securely, use EKS Pod Identity to grant the Karpenter controller permission to provision EC2 instances. Additionally, deploy an Amazon SQS queue to listen for AWS spot interruption warnings. When a spot instance termination notice is received, Karpenter immediately drains the node and provisions a replacement, keeping application downtime to zero.

Real-World Gotchas and Architectural Trade-offs

No architecture is perfect, and a solo build highlights several friction points that enterprise teams often gloss over:

  • Service Discovery Redundancy: The Spring PetClinic application natively uses Spring Cloud Eureka for service discovery and Spring Cloud Config Server for configuration. In a modern Kubernetes environment, this is redundant. Kubernetes has built-in DNS for service discovery and ConfigMaps/Secrets for configuration. Keeping the Spring Cloud infrastructure introduces unnecessary JVM memory overhead and startup latency. For greenfield projects, rely on native Kubernetes primitives.
  • The Secret Management Gap: Storing database credentials or API keys in a public Git repository is a critical security failure. While a basic GitOps setup might tempt you to commit base64-encoded Kubernetes Secrets, production environments require an external secrets operator (like External Secrets Operator integrating with AWS Secrets Manager) or Sealed Secrets to ensure that no raw secrets ever touch Git.
  • Observability Overhead: Running Prometheus, Grafana, and Zipkin inside the same cluster is excellent for debugging, but it consumes significant memory. Java microservices are notoriously heavy on startup. Ensure your Karpenter node templates allow for memory-optimized instance types (like the r6i or m6i families) to prevent out-of-memory (OOM) kills during heavy deployment phases.

This architecture proves that a single engineer can build a highly resilient, production-grade delivery platform. By focusing on OIDC security boundaries, decoupling CI from CD, and relying on Karpenter for rapid node provisioning, you build a system that is easy to maintain, cheap to run, and highly secure.

Sources & further reading

  1. How I Built a Production-Style GitOps Platform on AWS EKS — Solo, From Scratch — dev.to
  2. GitHub - aws-samples/eks-saas-gitops: Source repo of Building SaaS applications on Amazon EKS using GitOps workshop — github.com
  3. GitOps | Amazon EKS Workshop — eksworkshop.com
Emeka Okafor
Written by
Emeka Okafor · Security Editor

Emeka has spent over a decade tracking threat actors, vulnerability disclosures, and the evolving landscape of application security, bringing a sharp continent-spanning perspective to his reporting. He's known for translating dense CVE advisories into clear, actionable context that developers and security teams alike actually read.

Discussion 1

Join the discussion

Sign in or create an account to comment and vote.

Brianna Cole @burned_out_bri · 1 hour ago

finally something that goes beyond hello world

Related Reading