The goal of this tutorial is to host a High-Availability Kubernetes Cluster on AWS. I am sure you would have come across the wonderful tutorial by Kelsey Hightower. We will follow a similar setup and High-Availability Kubernetes cluster. We will setup 3 controllers and 3 nodes. This tutorial assumes that you have programmatic access to AWS and the AWS command line. Instead of manually setting up the servers, we'll be using .
Install AWS Command Line
Follow the from Amazon to install the AWS CLI. Ensure that you have .
Setup networking
We'll create an AWS VPC to isolate our instances and load balancers.
Once a subnet is created, it would have a default route table but it will not be able to receive traffic from the internet. To do that, we are going to setup an internet gateway and send in traffic from the outside world.
Now an instance launched within this subnet can be accessed from the outside world.
Provisioning the controllers
Before launching the controller instance, we need to setup a security group to allow access to ports 22 and 6443 from the outside world and open up all traffic within the nodes in VPC.
We then create a keypair. This keypair will be used to authenticate via SSH. We'll also add the key to ssh-agent to allow us to ssh into the controllers and workers.
We'll use Ubuntu Bionic to launch the instances. I am launching instances on the ap-south-1 region. Please choose the AMI based on your region from the link below.
AMIID=ami-ee8ea481 # bionic from https://cloud-images.ubuntu.com/locator/ec2/
Now, we'll create 3 controller instances. I have chosen t2.medium instances with 32 GB of diskspace for the controllers. We disable source/dest on the instance to enable NAT routing across subnets.
This way I can use these handy functions without having to memorize the public IP address or change my SSH config. Note that I am adding the -A flag to forward my agent so that I can ssh to other controllers and workers without having to copy the PEM file to other machines.
ssh -A -l ubuntu $(awsip ${TAG}-controller-0)
Setting up Kubernetes components on the controller
Kubernetes uses etcd to store all its state. etcd is a highly available fault tolerant database. It is similar to Apache Zookeeper or Hashicorp's Consul. While kubeadm can setup etcd in a distributed mode, I would strongly recommend that you manage etcd outside Kubernetes' workload. Running etcd within Kubernetes as a static pod brings in cyclical dependencies. I am not comfortable doing that for database workloads. So we'll be setting up etcd external to kubernetes. We'll still be using docker to run etcd.
In an ideal scenario, you might want to run etcd cluster and the kubernetes controller nodes on different instances. But, for the sake of simplicity, we are going to run etcd and kubernetes controllers on the same instances.
Setting up certificates for etcd
Now lets create a directory to hold our certificates.
mkdir tls
cd tls
Now lets create a CA certificate to sign our certificates.
This will create ca.pem and ca-key.pem files. Now lets create certs for etcd. We'll be using the same cert and key for both etcd peers and etcd clients.
This should create etcd-key.pem and etcd.pem in your tls folder. Now lets copy the tls folder to all the controller instances.
for i in 0 1 2; do
scp -r tls ubuntu@`awsip ${TAG}-controller-${i}`:~
done
Setting up etcd on instances.
These commands have to be run on each of the controllers.
We will setup a directory to hold etcd certificates we just created and a directory /var/lib/etcd to store etcd's data. We'll also copy our certificates we just uploaded to the the /etc/etcd directory.
We'll then get the instance's internal IP to bind ports 2379 for listening to clients and 2380 for listening to peers. We'll also use the local host name as etcd's node name. AWS set's the host name to ip-xx.xx.xx.xx where xx.xx.xx.xx is the primary internal ip of the instance.
Now we'll create a systemd service to launch etcd with the appropriate configuration as a docker container. We'll mount the the /etc/etcd directory and /var/lib/etcd directory from the host on to the container. We'll also use host networking on the docker container to expose the container ports 2379 and 2380 directly on the host. Also we'll set the initial-cluster-state to new and statically declare the initial cluster members.
Ensure that you have run the above commands on all the controllers.
Setup a load balancer
Lets setup a network loadbalancer and create a target group with just the first controller. Once we bring the other masters up, we'll add them to the target group as well.
Copy this somewhere safe as we will be using this to add our worker nodes to the cluster.
Setup the other controllers
For setting up controller-1 and controller-2, we need to to copy over the certificates we created on the first controller. On the first controller, do the following.
cd /etc/kubernetes/pki
sudo tar -cvf /home/ubuntu/certs.tar ca.crt ca.key sa.key sa.pub front-proxy-ca.crt front-proxy-ca.key
cd $HOME
scp kubeadm.cfg ubuntu@10.10.128.11:
scp certs.tar ubuntu@10.10.128.11:
scp kubeadm.cfg ubuntu@10.10.128.12:
scp certs.tar ubuntu@10.10.128.12:
Now on controller-1 and controller-2, replace the hostname to match the full hostname of the node and then do the following.
This would also produce the join tokens but we can use the token produced by the first controller to join the other workers. Since all the controllers are up, we can add controller-1 and controller-2 to the target group so that the loadbalancer will route API requests to all the three controller nodes.
We can now access the cluster from our dev machine. To do that we have to copy over a file /etc/kubernetes/admin.conf to our machine. Run the following commands on the first controller.
# on first controller
sudo cp /etc/kubernetes/admin.conf ~/kubeconfig.yaml
sudo chown ubuntu:ubuntu ~/kubeconfig.yaml
And now we can copy the file down the dev machine and access the cluster locally.
scp ubuntu@`awsip ${TAG}-controller-0`:kubeconfig.yaml kubeconfig.yaml
KUBECONFIG=kubeconfig.yaml kubectl get nodes
# this should produce the following output
NAME STATUS ROLES AGE VERSION
ip-10-10-128-10 NotReady master 39m v1.11.0
ip-10-10-128-11 NotReady master 9m v1.11.0
ip-10-10-128-12 NotReady master 4m v1.11.0
The masters are not ready yet since the Pod network is not initialized. We will be using Canal, which sets up Flannel for pod networking and Calico for enforcing network policy.
Now that networking is setup, our clusters should be in the ready state.
KUBECONFIG=kubeconfig.yaml kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-10-128-10 Ready master 50m v1.11.0
ip-10-10-128-11 Ready master 21m v1.11.0
ip-10-10-128-12 Ready master 16m v1.11.0
Provisioning worker nodes
We'll use a similar configuration to that of masters and provision three t2.medium workers. As with the master I am using the bionic image in the ap-south-1 region. Please note, we do not have to allocate static IPs any more for the worker nodes. We can start treating worker nodes as cattle instead of pets. I am only doing so to identify which workloads are running which pods. This will be used to demonstrate node scheduling in subsequent sections.
Now back on the dev machine, we can run kubectl get nodes to verify that the worker nodes have joined the cluster. Give them a few minutes, to get to the Ready state.
KUBECONFIG=kubeconfig.yaml kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-10-128-10 Ready master 1h v1.11.0
ip-10-10-128-11 Ready master 37m v1.11.0
ip-10-10-128-12 Ready master 32m v1.11.0
ip-10-10-128-20 Ready <none> 2m v1.11.0
ip-10-10-128-21 Ready <none> 2m v1.11.0
ip-10-10-128-22 Ready <none> 2m v1.11.0
We have now setup a fully functioning HA cluster.
Cleaning Up
First we delete all the controller & worker instances
for i in 0 1 2; do
aws ec2 terminate-instances --instance-ids $(awsid ${TAG}-controller-${i})
log "Terminated instance ${TAG}-controller-${i}" && delete_tags $(awsid ${TAG}-controller-${i})
aws ec2 terminate-instances --instance-ids $(awsid ${TAG}-worker-${i})
log "Terminated instance ${TAG}-worker-${i}" && delete_tags $(awsid ${TAG}-worker-${i})
done
We then delete the load balancer and network resources
We'll be following the instructions on the to install Docker and other Kubernetes components. I have chosen to install Docker from docker's APT repository. Ensure that the Docker and Kubernetes components are installed on all the controllers.
Now lets create a certificates for etcd. We'll be using to generate certificates for our etcd cluster. If you are on OSX, you can install cfssl via homebrew.
Let us create a KubeAdm configuration. We'll be using for our pod network and for enforcing NetworkPolicy.
We'll be following the instructions on the as we did with setting up the controllers to setup the Docker and Kubernetes tools. These steps are virtually identical to that of the master.