10 min read2 days ago
–
Introduction
In this blog, we compare two popular ways of running Kubernetes on AWS — Amazon EKS (managed) and kOps (self-managed) — with both provisioned using Terraform. Choosing the right model matters because cluster cost, operational overhead, upgrade strategy, compliance, and automation capabilities all directly affect platform reliability and business outcomes.
While AWS offers UI-based provisioning for EKS and kOps provides direct CLI commands, this blog intentionally uses Terraform. UI or ad-hoc CLI deployment has no versioning, auditability, or repeatability. Terraform, in contrast, enables Infrastructure-as-Code, automation, resource traceability, and controlled lifecycle management. It allows teams to define the entire cluster and suppor…
10 min read2 days ago
–
Introduction
In this blog, we compare two popular ways of running Kubernetes on AWS — Amazon EKS (managed) and kOps (self-managed) — with both provisioned using Terraform. Choosing the right model matters because cluster cost, operational overhead, upgrade strategy, compliance, and automation capabilities all directly affect platform reliability and business outcomes.
While AWS offers UI-based provisioning for EKS and kOps provides direct CLI commands, this blog intentionally uses Terraform. UI or ad-hoc CLI deployment has no versioning, auditability, or repeatability. Terraform, in contrast, enables Infrastructure-as-Code, automation, resource traceability, and controlled lifecycle management. It allows teams to define the entire cluster and supporting services in code, version them, review changes, and re-provision environments consistently across development, staging, and production.
By the end of this blog, you will understand when to choose EKS vs kOps based on cost and operational needs, and why Terraform remains the preferred method for Kubernetes infrastructure provisioning on AWS.
Choosing Between EKS and kOps: Which One Fits Your Use Case?
When deciding between Amazon EKS and kOps for running Kubernetes on AWS, the choice largely depends on how much control, flexibility, and platform engineering maturity you need. EKS provides a managed control plane, easier onboarding, and reduced operational responsibility. However, it limits deep customization and often leads to higher overall cost due to managed control plane pricing, AWS-controlled versions/upgrades, and service add-ons that accumulate over time.
kOps, on the other hand, delivers full control over cluster architecture, networking, node lifecycle, and upgrade flows. It is cloud-agnostic, production-proven, and integrates well into custom platform engineering environments. With kOps you decide how the control plane runs, how networking is configured, how high-availability is achieved, and how infrastructure evolves — down to the EC2 instances, operating systems, VPC, DNS, and Kubernetes versioning. This makes kOps superior for organizations that need customization, vendor neutrality, advanced networking/security policies, or cost optimization at scale. The only real drawback is that it expects stronger Kubernetes and operations knowledge, but that is an engineering investment — not a financial cost — and ultimately improves team capability and infrastructure maturity.
Benefits of Choosing kOps:
- Full architectural and operational control (networking, DNS, OS, versions, HA, etc.)
- Lower ongoing costs since no managed control plane fees
- Easier customization for compliance, security, and topology requirements
- Cloud-agnostic and not locked into AWS-specific managed services
- Better alignment with platform engineering teams who prefer flexibility
- Production-ready with mature HA and upgrade workflows
- Works exceptionally well with Terraform for true end-to-end IaC
Drawbacks of Choosing kOps:
- Requires deeper Kubernetes and infrastructure knowledge
- Responsibility for control plane operations (not a monetary cost, just expertise)
- More moving parts for teams unfamiliar with cluster internals
Benefits of Choosing EKS:
- Managed control plane reduces operational responsibility
- Faster onboarding for less experienced teams
- Simplified upgrades and patching at the control plane level
- Strong AWS ecosystem integrations (IAM, VPC CNI, ALB ingress, etc.)
Drawbacks of Choosing EKS:
- Higher total cost (managed control plane + add-on dependencies)
- Limited deep customization of Kubernetes internals
- AWS-dependent upgrade schedules and compatibility constraints
- Vendor lock-in and reduced infrastructure portability
Prerequisites
Before proceeding, the following prerequisites are recommended:
- AWS Account and IAM Access
- Terraform Installed
- AWS CLI Configured
- kOps Installed (Only for kOps-based Cluster Walkthrough)
#install AWS CLI# macOSbrew install awscli# Ubuntu/Debiancurl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"unzip awscliv2.zipsudo ./aws/installaws configure# Enter your AWS Access Key ID, Secret Access Key, and default region (us-west-2)#install terraform# macosbrew install terraform# Ubuntu/Debianwget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor | sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpgecho "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.listsudo apt update && sudo apt install terraform#install kubectl commnad# macosbrew install kubectl# Ubuntu/Debiancurl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"chmod +x kubectlsudo mv kubectl /usr/local/bin/kubectl version --client#install kops (changed the url based on arm & amd based)curl -LO https://github.com/kubernetes/kops/releases/download/v1.30.0/kops-darwin-arm64 chmod +x kops-darwin-arm64sudo mv kops-darwin-arm64 /usr/local/bin/kopskops version
Creating an EKS Cluster Using Terraform
In this section, we will walk through provisioning a production-ready Amazon EKS cluster on AWS using Terraform. The goal is to eliminate manual provisioning steps, ensure repeatability, and maintain infrastructure through version-controlled IaC workflows.
What This Terraform Setup Creates
The Terraform configuration provisions the following resources in AWS:
- A dedicated VPC with public and private subnets across multiple Availability Zones
- Internet Gateway and NAT Gateway for routing public and private workloads
- Route Tables and networking associations required for EKS communication
- EKS Control Plane and cluster endpoints
- Worker Node Groups (EC2 or Managed Node Groups) depending on configuration
- IAM Roles & Policies for cluster operations, nodes, and service accounts
- Security Groups for cluster communication and load balancers
- Associated services required for a fully functional Kubernetes cluster
This ensures that once Terraform completes, you have a working Kubernetes environment where you can deploy workloads, ingress controllers, and add-on services like monitoring, logging, and autoscaling.
Using the Terraform Module
For this demonstration, we use an existing Terraform module hosted in the following GitHub repository, which contains reusable and modular Terraform code specifically structured for EKS provisioning. It abstracts away low-level complexity while keeping the configuration customizable.
Steps to Deploy
Assuming Terraform and AWS CLI are configured locally.
git clone https://github.com/sanskar153/EKS-clustercd EKS-clusterterraform initterraform planterraform apply
Additional Notes
The repository contains a [README\.md](https://github.com/sanskar153/EKS-cluster?tab=readme-ov-file#aws-eks-infrastructure-with-terraform) which provides reference documentation, variable descriptions, and optional configurations. It is recommended to review the README.md for environment-specific adjustments or module parameters.
Press enter or click to view image in full size
After running terraform plan, you will notice that a total of 53 resources are scheduled for creation. Review the plan output to confirm all resources, and then proceed with terraform apply
Press enter or click to view image in full size
Press enter or click to view image in full size
Once the apply command finishes & cluster is created, use the following command to connect to it. After running it, you will also notice that your kubeconfig file (located at ~/.kube/config) has been updated with the required authentication token.
aws eks update-kubeconfig --region <region> --name <cluster-name>
Once provisioning is complete, your EKS cluster is ready for workloads. Application services, monitoring, and observability systems can be deployed through kubectl. You can also deploy my observability stack, linked below for reference.
Creating an Inhouse Kubernetes Cluster Using kOps + Terraform
kOps is a popular tool for provisioning production-grade Kubernetes clusters, especially for teams that require deeper customization, full infrastructure ownership, and cloud portability. Traditionally, kOps allows you to create clusters directly using the kops create cluster command, which provisions infrastructure on the fly. While this approach works, it lacks strong versioning, auditability, and controlled lifecycle management.
Get Sanskar Agrawalla’s stories in your inbox
Join Medium for free to get updates from this writer.
In this guide, we use kOps together with Terraform. Instead of directly applying changes through kOps, we generate Terraform manifests using the --target=terraform flag. This converts the cluster specification into Infrastructure-as-Code, enabling teams to:
- Maintain version control through Git
- Track and review infrastructure changes before applying them
- Integrate cluster provisioning into CI/CD pipelines
- Manage upgrades gracefully through declarative state files
- Enforce consistent and repeatable deployments across environments
A key detail to remember is that Kubernetes version compatibility is tied to the kOps version. If your target is Kubernetes v1.30, your kOps binary must also be v1.30 to ensure proper support and successful cluster provisioning.
Prior to executing the kops create cluster command, an S3 bucket must be created to serve as the kOps state store. kOps relies on this state bucket to track configuration, updates, and cluster lifecycle. After creating the bucket, include it in the command via the --state flag.
Define the cluster configuration:
kops create cluster \ --name test.k8s.local \ --master-zones us-west-2b,us-west-2d \ --dns-zone cluster.local \ --zones us-west-2b,us-west-2d \ --topology private \ --master-count 3 \ --master-size t3.large \ --node-count 2 \ --node-size t3.large \ --node-volume-size 50 \ --ssh-public-key ~/.ssh/id_rsa.pub \ --networking cilium \ --network-cidr 172.20.0.0/16 \ --state s3://test-cluster.state \ --target terraform \ --out . \ --yes
Flag-by-Flag Explanation
**--name test.k8s.local**
Specifies the cluster name and DNS domain. For private DNS setups, .k8s.local is commonly used.
**--master-zones us-west-2b,us-west-2d**
Defines which Availability Zones will host the control plane (masters).
Multiple AZs enable high availability for the control plane.
**--dns-zone cluster.local**
Specifies the DNS zone used for resolving cluster internal DNS. In private topologies, .local DNS zones work without public records.
**--zones us-west-2b,us-west-2d**
Defines the Availability Zones for worker nodes. Matching control plane and worker zones improves latency and redundancy.
**--topology private**
Creates a private topology where nodes are launched in private subnets without public IPs. Traffic flows through NAT gateways or bastion hosts for outbound access. This is recommended for production security.
**--master-count 3**
Creates three master nodes for HA. With an odd number of nodes, etcd can maintain quorum during failures.
**--master-size t3.large**
Instance type for control plane nodes. t3.large offers a balanced CPU/RAM profile for moderate workloads.
**--node-count 2**
Creates two worker nodes. These run application workloads (pods).
**--node-size t3.large**
Instance type for worker nodes. Can be tuned based on workload requirements, CPU-heavy vs memory-heavy applications, etc.
**--node-volume-size 50**
Defines worker node root volume size (in GB). Useful for container images, logs, and ephemeral data.
**--ssh-public-key ~/.ssh/id_rsa.pub**
SSH key to access worker and master nodes via SSH if needed. Placed for troubleshooting or system management.
**--networking cilium**
Selects the CNI plugin. Cilium provides advanced L3/L4/L7 observability, eBPF networking, and network security policies.
**--network-cidr 172.20.0.0/16**
CIDR range for the VPC overlay. Determines pod and node IP ranges inside the cluster network.
**--state s3://test-cluster.state**
Specifies the S3 bucket where kOps stores cluster configuration and state.
This acts as the source of truth for cluster management, upgrades, and operations.
**--target terraform**
Instead of provisioning directly, this outputs Terraform manifests.
Allows infrastructure to be version-controlled and applied via Terraform.
**--out .**
Output directory for generated Terraform files (in this case, the current directory).
**--yes**
Confirms execution without prompting for interactive approval.
Once you execute the above kops create cluster command with the --target=terraform flag, kOps will generate two important artifacts in the working directory:
- A
kubernetes.tffile that defines all the AWS infrastructure components required to run the cluster (VPC, subnets, route tables, EC2 instances, security groups, IAM roles, etc.). - A
data/directory that contains Kubernetes manifest data and supporting templates generated by kOps for cluster bootstrapping.
Press enter or click to view image in full size
Press enter or click to view image in full size
With these artifacts in place, you can now proceed using Terraform as the execution engine. This gives you full Infrastructure-as-Code capabilities rather than provisioning resources directly through kOps.
At this stage, run the following commands:
terraform initterraform plan
terraform plan will show all the AWS resources that are about to be created. After verifying the output and ensuring everything matches expectations, apply the changes by apply command.
Terraform will then create the entire Kubernetes control plane and worker nodes along with the VPC infrastructure, networking, IAM roles, Route53 records (if applicable), and other required resources. This process effectively makes the cluster provisioning repeatable, traceable, and version-controlled through Git.
Press enter or click to view image in full size
Press enter or click to view image in full size
Once the Terraform apply step completes successfully, your Kubernetes cluster will be deployed at the infrastructure level. The next step is retrieving the kubeconfig so you can interact with the cluster using kubectl. You can obtain the kubeconfig file using kOps:
kops export kubeconfig test.k8s.local --admin --admin=17520h
After exporting, verify connectivity:
kubectl get nodes
If the nodes are in Ready state, your kOps-based Kubernetes cluster is up and running. You can now deploy applications, ingress controllers, monitoring stacks, and observability tooling using kubectl apply just like any other Kubernetes cluster.
With this setup, you have successfully provisioned your own in-house, production-ready Kubernetes cluster, independent of any managed Kubernetes service. This approach can significantly reduce AWS costs because you are only billed for the underlying EC2 instances, storage, and load balancers, rather than a managed control plane. Once the cluster is operational, it becomes crucial to monitor nodes, capture logs, and observe application behavior. For that, you can refer to my complete in-house logging and observability stack guide linked below. See you there.
For those interested in the complete architectural layout of the cluster from the AWS perspective, the [README](https://github.com/sanskar153/kops-k8s-cluster?tab=readme-ov-file#kops-kubernetes-cluster-aws--terraform-kubernetestf) file contains comprehensive documentation. It explains which resources are provisioned, how networking is structured, and how the overall system is wired together, including diagrams and descriptions for better visualization.
From here, you retain full control over future upgrades by modifying the kOps cluster spec, regenerating Terraform manifests, and re-applying them — giving you a clean and auditable lifecycle for Kubernetes operations.
If you would like to see a follow-up article on performing zero-downtime Kubernetes version upgrades on a kOps-managed cluster, let me know. It is a topic I may cover in a future post. If you found this content valuable, please consider sharing it with your DevOps or SRE peers.