README
The goal of this tutorial is to host a High-Availability Kubernetes Cluster on AWS. I am sure you would have come across the wonderful Kubernetes the Hard Way tutorial by Kelsey Hightower. We will follow a similar setup and High-Availability Kubernetes cluster. We will setup 3 controllers and 3 nodes. This tutorial assumes that you have programmatic access to AWS and the AWS command line. Instead of manually setting up the servers, we'll be using kubeadm.
Install AWS Command Line
Follow the installation instructions from Amazon to install the AWS CLI. Ensure that you have configured the CLI.
Setup networking
We'll create an AWS VPC to isolate our instances and load balancers.
TAG="awsklstr"
VPCID=$(aws ec2 create-vpc --cidr-block 10.10.0.0/16 | jq -r .Vpc.VpcId)
aws ec2 create-tags --resources $VPCID --tags Key=Name,Value=$TAG
aws ec2 modify-vpc-attribute --enable-dns-hostnames --vpc-id $VPCID
aws ec2 modify-vpc-attribute --enable-dns-support --vpc-id $VPCIDOnce a VPC is created, we need to setup a subnet.
SUBNETID=$(aws ec2 create-subnet --vpc-id=$VPCID --cidr-block=10.10.128.0/17 | jq -r .Subnet.SubnetId)
aws ec2 create-tags --resources $SUBNETID --tags Key=Name,Value=$TAGOnce a subnet is created, it would have a default route table but it will not be able to receive traffic from the internet. To do that, we are going to setup an internet gateway and send in traffic from the outside world.
RTBID=$(aws ec2 create-route-table --vpc-id $VPCID | jq -r .RouteTable.RouteTableId)
aws ec2 create-tags --resources $RTBID --tags Key=Name,Value=$TAG
aws ec2 associate-route-table --subnet-id $SUBNETID --route-table-id $RTBID
IGWID=$(aws ec2 create-internet-gateway | jq -r .InternetGateway.InternetGatewayId)
aws ec2 create-tags --resources $IGWID --tags Key=Name,Value=$TAG
aws ec2 attach-internet-gateway --internet-gateway-id $IGWID --vpc-id $VPCID
aws ec2 create-route --route-table-id $RTBID --destination-cidr-block 0.0.0.0/0 --gateway-id $IGWIDNow an instance launched within this subnet can be accessed from the outside world.
Provisioning the controllers
Before launching the controller instance, we need to setup a security group to allow access to ports 22 and 6443 from the outside world and open up all traffic within the nodes in VPC.
We then create a keypair. This keypair will be used to authenticate via SSH. We'll also add the key to ssh-agent to allow us to ssh into the controllers and workers.
We'll use Ubuntu Bionic to launch the instances. I am launching instances on the ap-south-1 region. Please choose the AMI based on your region from the link below.
Now, we'll create 3 controller instances. I have chosen t2.medium instances with 32 GB of diskspace for the controllers. We disable source/dest on the instance to enable NAT routing across subnets.
A Quick Aside
I use these handy functions to quickly interact with AWS CLI.
This way I can use these handy functions without having to memorize the public IP address or change my SSH config. Note that I am adding the -A flag to forward my agent so that I can ssh to other controllers and workers without having to copy the PEM file to other machines.
Setting up Kubernetes components on the controller
We'll be following the instructions on the KubeAdm Install page to install Docker and other Kubernetes components. I have chosen to install Docker from docker's APT repository. Ensure that the Docker and Kubernetes components are installed on all the controllers.
Lets install docker from Docker's repository.
Now lets install the Kubernetes components from Kubernetes' APT repository.
Installing etcd
Kubernetes uses etcd to store all its state. etcd is a highly available fault tolerant database. It is similar to Apache Zookeeper or Hashicorp's Consul. While kubeadm can setup etcd in a distributed mode, I would strongly recommend that you manage etcd outside Kubernetes' workload. Running etcd within Kubernetes as a static pod brings in cyclical dependencies. I am not comfortable doing that for database workloads. So we'll be setting up etcd external to kubernetes. We'll still be using docker to run etcd.
In an ideal scenario, you might want to run etcd cluster and the kubernetes controller nodes on different instances. But, for the sake of simplicity, we are going to run etcd and kubernetes controllers on the same instances.
Now lets create a certificates for etcd. We'll be using cfssl to generate certificates for our etcd cluster. If you are on OSX, you can install cfssl via homebrew.
Setting up certificates for etcd
Now lets create a directory to hold our certificates.
Now lets create a CA certificate to sign our certificates.
This will create ca.pem and ca-key.pem files. Now lets create certs for etcd. We'll be using the same cert and key for both etcd peers and etcd clients.
This should create etcd-key.pem and etcd.pem in your tls folder. Now lets copy the tls folder to all the controller instances.
Setting up etcd on instances.
These commands have to be run on each of the controllers.
We will setup a directory to hold etcd certificates we just created and a directory /var/lib/etcd to store etcd's data. We'll also copy our certificates we just uploaded to the the /etc/etcd directory.
We'll then get the instance's internal IP to bind ports 2379 for listening to clients and 2380 for listening to peers. We'll also use the local host name as etcd's node name. AWS set's the host name to ip-xx.xx.xx.xx where xx.xx.xx.xx is the primary internal ip of the instance.
Now we'll create a systemd service to launch etcd with the appropriate configuration as a docker container. We'll mount the the /etc/etcd directory and /var/lib/etcd directory from the host on to the container. We'll also use host networking on the docker container to expose the container ports 2379 and 2380 directly on the host. Also we'll set the initial-cluster-state to new and statically declare the initial cluster members.
Now that we have the service definition created, lets start it with systemd.
Ensure that you have run the above commands on all the controllers.
Setup a load balancer
Lets setup a network loadbalancer and create a target group with just the first controller. Once we bring the other masters up, we'll add them to the target group as well.
Lets get the public DNS address of our load balancer.
Initialize the first controller
Let us create a KubeAdm configuration. We'll be using Canal for our pod network and for enforcing NetworkPolicy.
We can now copy this configuration to the first controller.
Run the following command to initialize the cluster.
This should produce a join token like this.
Copy this somewhere safe as we will be using this to add our worker nodes to the cluster.
Setup the other controllers
For setting up controller-1 and controller-2, we need to to copy over the certificates we created on the first controller. On the first controller, do the following.
Now on controller-1 and controller-2, replace the hostname to match the full hostname of the node and then do the following.
This would also produce the join tokens but we can use the token produced by the first controller to join the other workers. Since all the controllers are up, we can add controller-1 and controller-2 to the target group so that the loadbalancer will route API requests to all the three controller nodes.
Accessing the cluster from our dev machine
We can now access the cluster from our dev machine. To do that we have to copy over a file /etc/kubernetes/admin.conf to our machine. Run the following commands on the first controller.
And now we can copy the file down the dev machine and access the cluster locally.
The masters are not ready yet since the Pod network is not initialized. We will be using Canal, which sets up Flannel for pod networking and Calico for enforcing network policy.
Now that networking is setup, our clusters should be in the ready state.
Provisioning worker nodes
We'll use a similar configuration to that of masters and provision three t2.medium workers. As with the master I am using the bionic image in the ap-south-1 region. Please note, we do not have to allocate static IPs any more for the worker nodes. We can start treating worker nodes as cattle instead of pets. I am only doing so to identify which workloads are running which pods. This will be used to demonstrate node scheduling in subsequent sections.
Bootstrapping the worker nodes
We'll be following the instructions on the KubeAdm Install page as we did with setting up the controllers to setup the Docker and Kubernetes tools. These steps are virtually identical to that of the master.
Lets install docker from Docker's repository. Run these commands as root on all the three worker nodes.
Now lets install Kubernetes components. Run these commands as root on all the three worker nodes.
Once that is done, we can now join the cluster by running the kubeadm command that you copied over earlier.
Now back on the dev machine, we can run kubectl get nodes to verify that the worker nodes have joined the cluster. Give them a few minutes, to get to the Ready state.
We have now setup a fully functioning HA cluster.
Cleaning Up
First we delete all the controller & worker instances
We then delete the load balancer and network resources
Last updated